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Albert  Cohen,  Ronald  DeVore,  Pencho  Petrushev  and  Hong  Xu 


Abstract.  Given  a  function  /  £  L’ziQ),  Q  :  =  [0,  l)2  and  a  real  number  t  >  0,  let  U(f,  t)  :  = 
inf3gBy(-Q)  ||/  —  g\\2L2(I~)  +  r),  where  the  inhmum  is  taken  over  all  functions  g  £  BV 

of  bounded  variation  on  I.  This  and  related  extremal  problems  arise  in  several  areas  of 
mathematics  such  as  interpolation  of  operators  and  statistical  estimation,  as  well  as  in  digital 
image  processing.  Techniques  for  finding  minimizers  g  for  U (/,  t)  based  on  variational  calculus 
and  nonlinear  partial  differential  equations  have  been  put  forward  by  several  authors  ([DMS], 
[LOR],  [MS],  [CL]).  The  main  disadvantage  of  these  approaches  is  that  they  are  numerically 
intensive.  On  the  other  hand,  it  is  well-known  that  more  elementary  methods  based  on  wavelet 
shrinkage  solve  related  extremal  problems,  for  example,  the  above  problem  with  BV  replaced 
by  the  Besov  space  (see  e.g.  [CDLL]).  However,  since  BV  has  no  simple  description 

in  terms  of  wavelet  coefficients,  it  is  not  clear  that  minimizers  for  U(f,  t)  can  be  realized  in 
this  way.  We  shall  show  in  this  paper  that  simple  methods  based  on  Haar  thresholding 
provide  near  minimizers  for  Our  analysis  of  this  extremal  problem  brings  forward 

many  interesting  relations  between  Haar  decompositions  and  the  space  BV. 


1.  Introduction. 

Nonlinear  approximation  has  recently  played  an  important  role  in  several  problems  of 
image  processing  including  compression,  noise  removal,  and  feature  extraction.  We  have 
in  mind  techniques  such  as  wavelet  compression  [DJL],  wavelet  shrinkage  or  thresholding 
[DJKP1],  wavelet  packets  [CW],  and  greedy  algorithms  [MZ,  DT],  There  has  also  been  an 
impressive  contribution  of  techniques  based  on  variational  calculus  and  nonlinear  partial 
differential  equations  (see  e.g.  [DMS],  [LOR],  [MS],  [CL])  especially  to  the  problems  of 
noise  removal  and  image  segmentation.  The  common  point  between  these  two  approaches 
is  their  ability  to  adapt  to  the  composite  nature  of  images:  edge,  textures  and  smooth 
regions  should  be  treated  adaptively,  a  requirement  which  is  certainly  not  fulfilled  by  the 
classical  linear  filtering  techniques. 

One  problem  which  plays  an  important  role  in  the  latter  approach  is  the  the  following 
extremal  problem  introduced  in  [LOR]: 

Given  a  function  (image)  f  defined  on  the  unit  square,  Q  :  =  [0,  l)2,  and  a  parameter 
t  >  0,  find  the  function  g  £  BV(Q)  which  attains  the  infimum 

(1-1)  U(f,t):=  inf '  \\f  -g\\l2(Q)+ tV Q(g). 


Here  BV ( Q )  is  the  space  of  functions  of  bounded  variation  on  Q  (see  §2  for  the  definition 
of  this  space)  and  Vq(/)  =  |/|bv  is  the  associated  semi-norm,  i.e.  the  total  variation  of  /. 
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In  the  practice  of  noise  removal,  /  represents  the  noisy  image  and  t  is  usually  chosen  to  be 
proportional  to  the  noise  level.  The  minimizer  g  then  appears  as  a  denoised  image.  The 
functional  in  (1.1)  can  also  be  viewed  as  a  variant  of  the  Mumford  and  Shah  functional 
introduced  in  their  celebrated  paper  [MS]  on  image  processing. 

A  minimization  problem  close  to  (1.1)  is  also  familiar  in  the  context  of  interpolation  of 
linear  operators:  the  expression 


(1.2) 


K(f,t):=K(f,t,L2(Q),BV(Q)) 


jVf  I \f  ~  9WL2(Q)+tYQ(9)^ 

g£BV(Q) 


called  the  K-functional  of  /  for  the  pair  (Z^Q),  BV(Q)),  is  the  basic  tool  for  generating 
interpolation  spaces  between  these  two  spaces  by  the  so-called  real  method. 

Numerical  techniques  for  solving  (1.1)  based  on  partial  differential  equations  have  been 
developed  and  successfully  applied  to  image  processing.  The  advantage  of  these  techniques 
is  high  performance.  Their  disadvantage  is  they  are  numerically  intensive,  and  require  in 
practise  the  approximation  of  the  BV  term  in  U(f,t )  by  a  quadratic  term  (e.g.  f  (e  + 
|V/|2)1/2)  in  order  to  find  a  solution  in  reasonable  computational  time  (see  [VO]  for  a 
discussion  on  numerical  methods  for  solving  (1.1)). 

In  comparison,  wavelet  thresholding  methods  simply  amount  to  the  application  of  mul¬ 
tiscale  decomposition  and  reconstruction  algorithms  on  the  image,  and  of  a  thresholding 
procedure,  which  can  all  be  performed  in  0{N)  operations,  where  N  is  the  number  of  pix¬ 
els  in  the  image.  These  methods  can  be  made  translation  invariant  by  a  cyclic  averaging 
technique  introduced  in  [CD],  which  seems  to  bring  significant  visual  improvement,  while 
only  raising  the  complexity  to  (P(iVlogiV).  On  a  more  theoretical  point  of  view,  thresh¬ 
olding  procedures  have  been  proved  to  be  optimal,  in  the  minimax  sense  of  asymptotical 
statistics,  in  various  non-parametric  contexts  where  the  images  are  typically  modelized  by 
their  regularity  in  Sobolev  and  Besov  classes  (see  [DJKP2]). 

A  striking  remark  (see  [CDLL])  is  that  wavelet  thresholding  also  provides  the  exact 
solution  to  an  extremal  problem  which  is  very  close  to  (1.1),  namely 


(1.3) 


U(f,t):=  inf  ||  f-g 

g£Bl(L1(Q)) 


( Q ) 


+  t\9  I 


B11(L1(Q))t 


where  the  Besov  space  B\(Li(Q))  is  taken  in  place  of  the  (larger)  space  BV(Q).  Both 
BV(<5)  and  B\(Li(Q))  are  smoothness  spaces  of  order  one  in  Li(Q),  e.g.  the  space  BV(Q) 
is  the  same  as  Lip(l,Zi(Q))  (see  [M],  or  [DPI]  for  the  definition  of  the  Besov  spaces).  In 
contrast  to  BV,  the  B\(L\)  norm  has  a  simple  equivalent  expression  as  the  t\  norm  of  the 
coefficients  in  a  wavelet  basis  decomposition  /  =  /aV’ a  (where  A  denotes  the  set  of 

indices  for  the  wavelet  basis).  One  can  thus  use  this  decomposition  to  obtain  an  equivalent 
discrete  problem 


(1.4) 


inf  ^[|/A-9A|2+i|<MlJ, 


whose  solution  (obtained  by  minimizing  separately  on  each  index  A)  is  exactly  given  by  a 
“soft  thresholding”  procedure  at  level  t/2: 


(1.5) 


9A  =  Sgn(/A)  max{0,  |/a|  -  t/2}. 
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The  minimization  problem  (1.3)  can  thus  be  solved  (up  to  a  constant  related  to  the  equiv¬ 
alence  between  continuous  and  discrete  norms),  by  a  simple  wavelet-based  procedure. 

One  could  argue  that  the  distinction  between  the  two  problems  (1.1)  and  (1.2)  is  slight. 
However,  BV  seems  more  adapted  to  model  real  images,  since  it  allows  sharp  edges  (i.e. 
discontinuities  on  a  line),  which  cannot  occur  in  a  bivariate  function  that  belongs  to  the 
smaller  space  B\(Li).  This  fact  is  confirmed  in  the  practice  of  image  processing:  the 
performance  of  (1.1)  for  noise  removal,  for  example,  seems  slightly  better  than  that  of 
(1.3),  at  least  in  aesthetic  terms. 

We  call  a  family  of  functions  gt  a  near  minimizer  for  (1.1)  if 

(!-6)  \\f  ~  9t\\l2{Q)  +  tY Q{gt)  <  C  m£  \\f-g\\2L2{Q)  +  tV Q(g) 

JfcDV (Q) 

with  C  an  absolute  constant  (not  depending  on  t  or  /).  A  similar  definition  applies  to  (1.2). 
The  question  arises  whether  one  could  find  a  near  minimizer  to  (1.1)  and  (1.2),  using  simple 
non-linear  approximation  techniques  such  as  wavelet  thresholding.  Note  that  in  contrast 
to  B\(Li),  we  are  then  allowed  to  use  approximations  that  have  line  discontinuities,  such 
as  the  multidimensional  Haar  basis  or,  more  generally,  piecewise  constant  functions.  The 
main  point  of  this  paper  is  to  develop  such  techniques  and  to  prove  that  they  indeed  yield 
near  minimizers  for  the  problems  (1.1)  and  (1.2). 

Our  main  result  in  this  paper  is  to  show  that  either  of  the  extremal  problems  (1.1-2) 
has  a  near  minimizer  taken  from  certain  “non-linear”  spaces  Sjv,  IV  >  1,  whose  elements 
are  piecewise  constants  that  can  be  described  by  N  parameters.  In  the  case  of  wavelet 
thresholding,  the  space  Sjv  is  simply  the  set  of  all  linear  combinations  ^  f\H\  with  at 
most  N  terms  and  H\  the  bivariate  Haar  functions. 

In  order  to  prove  that  a  given  family  Hjv  provides  the  solution  to  (1.1)  or  (1.2),  we 
shall  make  use  of  several  ingredients,  among  which  are  two  types  of  inequalities  that  are 
frequently  used  in  numerical  analysis  and  approximation  theory: 

(i)  A  direct  or  Jackson  type  estimate 

(J--7)  inf  ||/ -  9\\l2(Q)  <  CVV_1/2|/|bv(q), 

gfcXjv 

that  describes  the  approximation  power  of  Hjy  for  functions  in  BV. 

(ii)  An  inverse  or  Bernstein  type  estimate 

(1.8)  I/Ibv(Q)  <  CA-iy|/||Ii(Q|  if  /  e  Eiv, 

that  describes  the  smoothness  properties  of  the  approximation  spaces  Hjv- 

When  BV  is  replaced  by  B\(L\)  and  Sjy  is  the  set  of  IV-terms  linear  combination  in 
a  sufficiently  smooth  wavelet  basis,  these  inequalities  reduce  to  simple  considerations  on 
sequences.  Since  the  BV  norm  has  no  simple  equivalent  expression  in  terms  of  the  wavelet 
coefficients  (it  is  actually  known  that  BV  is  nonseparable),  (1.7)  and  (1.8)  (in  particular 
the  direct  estimate)  are  by  far  less  obvious,  and  will  require  more  involved  arguments. 
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We  shall  now  give  a  more  precise  formulation  of  our  results.  We  shall  denote  by  E]y  the 
non-linear  spaces  associated  with  iV-term  approximation  in  the  Haar  system,  i.e. 

(1.9)  E£  :=  i  E  C  A,  \E\  <  N}, 

AG  E 

where  \E\  denotes  the  cardinality  of  the  discrete  set  E  (in  the  case  of  a  continuous  set 

ft  of  Rd,  |0|  will  stand  for  its  volume),  and  where  (H\) aga  is  the  bivariate  Haar  system 

derived  from  the  univariate  system  of  Z^O,  1]  by  the  usual  tensor-product  construction: 

from  H°  =  xtn  and  H 1  :=  y  /9,  —  Xri  /9  -.v  one  defines  the  multivariate  functions 

[0,1)  [0,1/2)  [1/2,1) 

(1.10)  He(x)  :=  Hei(Xl)He2(x2),  e  =  (ei,e2)eV, 

where  V  is  the  set  consisting  of  the  nonzero  vertices  of  Q.  The  bivariate  Haar  system  for 
L2(Q)  consists  of  the  constant  function  1  and  of  all  functions 

(1.11)  Hej,k{x)  =  2JHe(2Jx  -  k),  eey,j>0,kZ2n  2  jQ. 

We  refer  to  [D]  for  a  general  introduction  to  wavelet  bases. 

We  shall  prove  that  the  wavelet  thresholding,  which  is  equivalent  to  approximation  by 
the  elements  E)y,  gives  a  near  minimizer  to  the  extremal  problems  (1.1)  and  (1.2)  (§9). 
However,  our  proofs  are  neither  direct  nor  simple.  Rather,  we  prove  these  results  by 
considering  various  types  of  nonlinear  approximation  by  piecewise  constants.  Note  that 
the  functions  in  E]y  are  piecewise  constant  taking  at  most  2 N  values. 

To  describe  the  other  spaces  of  piecewise  constant  functions  which  we  shall  use  in  this 
paper  we  introduce  the  following  notation  which  will  be  used  throughout  the  paper.  If  O 
is  a  set  of  R2,  we  denote  by  its  characteristic  function,  and  by 

(1.12)  Mf)  =  lop1  /  /, 

Jn 

the  average  of  an  Zi-function  /  on  O.  By  definition,  a  dyadic  cube  I  is  the  tensor  product 
of  two  dyadic  intervals,  i.e.  I  =  /(j,  k,  l)  =  [2 -J  Ag  2 ~^(k  +  1))  X  [2-J7,  2 -J(/  +  1)).  We  shall 
denote  by  T>  :=  T>(Q )  the  set  of  all  dyadic  cubes  contained  in  Q}  and  by  T>k(Q )  the  set  of 
all  dyadic  cubes  in  T>(Q)  with  sidelength  2~k  (measure  2~2k).  We  denote  by  Sk  ■=  Sk(Q) 
the  space  of  piecewise  constants  on  the  partition  T>k(Q).  This  is  a  linear  space  spanned  by 
the  functions  (pj,  I  £  T>k(Q). 

We  define  the  family  of  non-linear  spaces  of  piecewise  constant  functions: 

(1.13)  S)y  =  cjLpi  ;  E  C  V,  \E\  <  IV}, 

I£E 

i.e.  all  linear  combinations  of  at  most  N  characteristic  functions  of  dyadic  cubes. 

A  natural  procedure  to  approximate  in  E)l)  is  the  simple  thresholding  of  wavelet  coef¬ 
ficients.  In  order  to  obtain  approximations  in  E)y,  one  can  think  of  different  procedures. 
The  simplest  one  is  based  on  a  quadtree  splitting  algorithm :  given  a  tolerance  e  >  0  and  a 


5 


function  /  £  Z^Q),  one  builds  an  adaptive  partition  of  Q  into  dyadic  cubes  by  splitting 
into  four  subcubes  each  cube  I  such  that  the  residual 

R(I)  :=  \\f  -  a,(J)\\L.AI), 

is  larger  than  e.  The  procedure  is  initiated  from  the  unit  cube  Q}  and  stops  when  all 
residuals  are  smaller  than  e,  and  /  is  then  approximated  by  f£  :=  a/(pj,  where  V£ 

is  the  final  partition  of  Q. 

The  approximation  properties  of  such  adaptive  algorithms  have  been  studied  in  [DY]. 
However,  this  algorithm  does  not  exploit  the  full  approximation  properties  of  Eft  since 
it  imposes  that  the  cubes  involved  in  the  definition  of  f£  are  disjoint.  One  can  actually 
show  by  simple  counterexamples  that  this  procedure  does  not  yield  the  direct  estimate  we 
desire  in  proving  (1.1)  or  (1.2),  i.e.  too  many  cubes  could  be  generated  to  achieve  a  certain 
accuracy  in  the  approximation  of  certain  BV  functions. 

A  more  efficient  procedure  should  thus  not  only  involve  splitting,  but  also  merging 
of  cubes,  which  will  amount  in  using  non-disjoint  cubes  in  the  definition  of  a  suitable 
approximation.  In  this  paper,  we  shall  introduce  a  “split  and  merge”  algorithm  that 
produces  an  approximation  of  /  based  on  disjoint  partitions  of  Q  into  dyadic  rings.  By 
definition  a  dyadic  ring  is  the  difference  between  two  embedded  dyadic  cubes,  i.e.  any  set 
of  the  type 

(1.14)  K  :=  I  \  J,  Jcl,  I,JeV. 

We  also  consider  a  dyadic  cube  to  be  a  degenerate  case  of  a  dyadic  ring  for  which  J  is 
empty.  Throughout  this  paper,  a  “cube”  will  always  stand  for  a  dyadic  cube,  and  a  “ring” 
for  a  dyadic  ring.  Our  third  family  of  approximation  space  Eft  is  the  set  of  all  functions 
of  the  form 

(1-15) 

nev 

where  V  is  a  set  of  at  most  N  dyadic  rings,  that  form  a  partition  of  Q}  i.e.  the  rings  are 
disjoint  and  union  to  Q.  Note  that  (1.11)  means  that  <p>Q  =  —  <pj  so  that  Eft  C  Elj N ■ 

We  can  thus  use  Eft  to  prove  results  on  approximation  by  Eft. 

An  important  point  that  should  be  mentioned  here  is  that  the  nonlinearity  of  the  three 
families  Eft,  Eft  and  Eft,  is  “controlled”  in  the  sense  that  they  all  satisfy 

(1.16)  Ejv  +  Em  C  Ea(M+JV), 

with  a  an  absolute  constant.  This  is  obvious  in  the  case  of  Eft  and  Eft,  with  a  =  1.  It  can 
also  be  proved  for  Eft  (with  a  larger  value  of  a). 

The  outline  of  our  paper  is  the  following: 

In  §2  ,  we  define  the  spaces  BV(fi)  for  domains  0  C  IR2  and  recall  certain  basic  properties 
of  these  spaces.  In  §3,  we  prove  inverse  estimates  of  the  type  (1.8)  for  the  spaces  Eft,  Eft 
and  Eft. 
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In  order  to  study  the  process  of  approximation  for  E)y,  we  prove  in  §4  the  projection 
error  estimate 

(1-17)  ||/ -  «n|U2(n)  <  Ci|/|bv(o), 

where  C\  is  independent  of  the  ring  fi.  We  then  prove  in  §5  the  stability  estimate 

(1-18)  I  ^2  «p(/)(dp|BV(Q)  <  C<2|/|bv(q), 

nev 

where  C 2  does  not  depend  on  the  partition  V  of  Q  into  disjoint  rings.  The  uniformity  of 
C\  and  C2  is  ensured  by  the  the  controlled  shape  of  a  dyadic  ring  which  cannot  be  very 
anisotropic. 

In  §6,  we  introduce  our  algorithm  for  approximation  by  the  elements  of  Ejy  and  use 
it  to  prove  the  Jackson  inequality.  This  algorithm  relies  on  a  general  result  concerning 
the  existence  of  partitions  of  Q  into  rings  which  are  well  balanced  with  respect  to  a  super¬ 
additive  cost  function.  We  prove  in  §7  that  this  algorithm  is  also  a  near  best  solution  to  the 
extremal  problem  (1.2).  We  anticipate  therefore  that  this  algorithm  will  be  useful  in  image 
processing  but  this  will  not  be  addressed  in  the  present  paper  which  mostly  concentrates 
on  the  theoretical  issues. 

In  §8,  we  prove  the  direct  estimate  for  (Haar)  wavelet  shrinkage,  i.e.  approximation  by 
Ejy,  and  we  show  in  §9  that  this  procedure  is  stable  in  BV  and  provide  solutions  for  the 
two  extremal  problems  (1.1)  and  (1.2).  It  should  be  pointed  out  that  the  results  of  these 
two  sections  make  important  use  of  the  results  that  we  establish  for  Ejy,  and  that  so  far 
we  do  not  know  how  to  prove  them  in  a  more  direct  way. 

Finally,  we  use  our  results  in  §10  to  identify  the  interpolation  spaces  between  ^(Q) 
and  BV{Q). 

Throughout  the  paper,  we  give  explicit  constants  for  all  important  inequalities.  Most 
of  them  (in  particular  (Co,  C 1,  •  •  •  ,  Cq)  that  appear  in  the  end  of  the  paper),  can  probably 
be  improved  using  more  refined  arguments. 

2.  The  space  BV(fi). 

In  this  section,  we  shall  define  for  certain  domains  Q  C  K.2 ,  the  spaces  BV(fi)  of  functions 
of  bounded  variation  on  Q  and  recall  some  basic  properties  of  this  space.  While  BV(fi) 
can  be  defined  for  general  domains,  in  this  paper,  we  shall  primarily  be  interested  in  rings 
Q  =  I  \  J,  where  I  and  J  C  I  are  in  'D(Q). 

For  a  vector  ji  £  IR2,  we  define  the  difference  operator  in  the  direction  /1  by 

(2.1)  A n(f,x)  :=  f(x  +  (i)  -  f(x). 

Let  Q  be  any  domain  in  IR2.  For  functions  /  defined  on  Q,  A is  defined  whenever 
x  £  fi(/i),  where  fi(/i)  :=  {x  :  [x,  x  +  n\  C  0}  and  [x,  x  +  n\  is  the  line  segment  connecting 
x  and  x  +  fi.  Note  that  if  Q  is  bounded  and  /j,  is  large  enough  then  0(/i)  is  empty.  Let 
e,-,  j  =  1,2,  be  the  two  coordinate  vectors  in  IR2.  We  say  that  a  function  f  £  L i(Q)  is  in 
BV(fi)  if  and  only  if 

2  2 

(2.2)  Vn(/)  :=  sup/W1  || AAej.  (/,  •)IUl(n(AeJ- ))  =  Jimn  S  II- A?tG  (/>  OlkiM^)) 

0 <h  “j 
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is  finite.  Here,  the  last  equality  in  (2.2)  follows  from  the  fact  that  ||A hejij ^)\\L1(Q{hef)) 
is  subadditive  (see  e.g.  Theorem  7.11.1  in  [HP]).  By  definition,  the  quantity  Vq(/)  is  the 
variation  of  /  over  fi.  It  provides  a  semi-norm  and  norm  for  BV(fi): 

(2-3)  |/|bv(0)  :=  Vn(/);  ||/||bv(o)  :=  |/|bv(o)  +  II/IImo)- 

Let  fi  =  fii  U  O2  where  fii  and  O2  are  disjoint  sets.  Then  for  any  h  >  0  and  j  =  1,2, 
one  has  the  inclusion  fii (hej)  U  fl2(hej)  C  fl(hej).  Hence,  for  j  =  1,2, 

(2-4)  1 1  (y,  )  1 1  L\  (Oi  (hej  ) )  T  1 1  (fi  )  1 1  L\ (O2  (hej  ))  —  ||^ftej(./)  )  ||  L\ (Q(hej  ))  • 

Summing  over  j  and  taking  the  the  limit  as  h  tends  to  0,  we  obtain 

(2-5)  Fni(/)  +  Vn2(/)  <  Vn(/). 

By  induction,  the  analogue  of  (2.5)  holds  for  any  finite  union  of  disjoint  sets. 

We  recall  the  Ti-modulus  of  continuity  cu(/,  f)o  which  is  defined  by 

(2-6)  u(f,t)n  :  =  sup  ||A^(/,  •)||Ll(n(/l)). 

\n\<t 

Here  and  later  \x\  :=  \J x\  +  is  the  Euclidean  metric.  For  any  ring,  we  have  that  BV(fi) 
is  identical  with  Lip(l,  L\  (fi)),  where  the  latter  set  consists  of  all  functions  such  that 

(2-7)  |/Ibv(«)  :=  supf_1cu(/,f)o 

t>  o 

is  finite.  We  also  have 

(2-8)  I/Ibv(o)  <  I/|bv(o)  <  2|/Ibv(o)- 

Indeed,  the  right  inequality  in  (2.8)  is  obvious  from  the  definition  of  the  two  semi-norms. 
The  left  inequality  follows  from  the  fact  for  any  point  x  £  0(/x) ,  /i  =  (/xi ,  ) ,  either 

[x,  x  +  /i iei]  and  \x-\~n iei,  x-\-  jj]  are  both  contained  in  fi  or  [x,  x-\-  1x2^2]  and  [x-\-/x 2^21  x-\-jx] 
are  both  contained  in  fi. 

For  a  ring  fi  =  I  \  J,  we  define  T>(fi)  to  be  the  set  of  all  I  £  T>  which  are  contained  in  fi 
and  similarly,  we  define  Th;(fi)  the  subset  of  T>(fi)  that  consists  of  the  cubes  of  sidelength 
2~k .  If  2~2k  <  |J|,  when  J  is  non  empty  or  if  2~2k  <  \I\  when  fi  =  I  is  a  cube,  we  can 
define  <S/t(fi)  to  be  the  restriction  of  Sk  to  fi.  For  any  /  £  L i(fi),  we  define  the  Pk(f)  to 
be  the  orthogonal  projection  of  /  onto  <S/t(fi).  Then, 

(2-9)  Pk(f)=  aj(./>j. 

I^Dh  (£2) 


It  is  easy  to  prove  that  whenever  /  £  BV (fi) 

(2.10)  ll/-Pt(/)llil(!!)  <2— Hh(/) 
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and 

(2-11)  Vn(P*(/))<Vn(/). 

For  a  proof  of  these  results  see  [L,  Chapter  3,  Lemma  3.2]  for  the  case  when  0  is  a  cube 
(the  same  proof  also  works  for  rings). 

It  is  also  easy  to  calculate  the  BV  norm  of  functions  S  £  <S/t(fi).  For  any  set  A  C  K.2 , 
let  Ck{A)  denote  the  edges  L  of  the  cubes  I  £  Fk{Q)  which  are  contained  in  A.  We  also 
denote  by  0°  the  interior  of  fi,  and  by  Jj,,  L  £  U,t(fi0),  the  jump  in  /  across  L.  Then, 
(see  again  [L,  Chapter  3,  Lemma  3.1]) 

(2.12)  Vn(/)=  2-‘  |Ji|. 

L£Ck(n°) 


3.  Inverse  estimates. 

In  the  introduction,  we  have  introduced  three  families  of  non-linear  spaces  (£jy,  Ejy 
and  Sjy).  We  begin  our  study  of  these  spaces  in  this  section  by  proving  (1.8)  for  any 
ring  0.  We  shall  obtain  specific  constants  in  (1.8)  although  this  is  not  important  for  the 
theoretical  results  that  follow. 

We  first  treat  the  space  which  appears  in  wavelet  thresholding. 

Theorem  3.1.  For  each  f  £  Ejy,  we  have 

(3.1)  Vq(/)<81V1/2||/||i,(Q). 

Proof.  We  first  observe  that  any  Haar  basis  function  ij)\  (see  (1.11))  satisfies 

(3-2)  Cq(Va)<8  =  8||Va||l2. 

Indeed,  if  the  support  of  is  a  square  I  of  side  length  h  =  2~k ,  then  it  takes  the  values 
±h_1  on  I.  We  can  calculate  Vq(V’a)  by  (2.12).  The  jumps  across  the  outer  boundary  of 
I  give  h~1Ah  =  4  and  those  across  the  inner  boundary  give  at  most  2h~12h  =  4.  Thus, 

(3.2)  is  proved. 

If  /  =  Eagu  /aV’a  is  in  £$,  then 

(3.3)  VQ(f)  <  8  Y,  I/a  <  8|£|1/2E  |/a|2]1/2  <  8A-1/2||/||iI, 

AG E  AG E 

by  the  Cauchy- Schwarz  inequality.  □ 

Remark  3.1.  Using  that  Vq(/)  <  8  Eagu  I/a  I?  we  also  obtain  the  following  variant  of  the 
inverse  inequality  (3.1):  Let  t  >  0  and  /  =  Eage/a/a  be  a  linear  combination  of  Haar 
wavelets  such  that  |/v|  >  t  for  all  A  £  E,  then 

(3-4)  |/|bv  < -||/|||2. 

We  now  prove  the  Bernstein  inequality  for  Sjy  by  a  very  similar  argument. 
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Theorem  3.2.  For  each  f  £  FrN,  we  have 

(3.5)  VQ(/)<i^JV1/2||/||i!(gl. 


Proof.  We  first  prove  that  if  0  =  I  \  J  is  any  ring  contained  in  Q}  then 


(3.6) 


\fin  |bv  < 


4a/5 

7T 


kn| \l2- 


To  prove  this,  let  £  be  the  side  length  of  I  and  hi  be  the  side  length  of  J.  Then,  ||(£o  IIl2(q)  = 
£2(1  —  h 2).  We  consider  two  cases.  In  the  first  case,  we  assume  that  J  is  in  the  interior  of 
I.  Then  necessarily,  h  <  1/4.  In  this  case  V q ((fin)  <  47  +  4 £h  =  47(1  +  h)  where  the  first 
term  comes  from  the  jump  across  the  outer  boundary  and  the  second  the  jump  across  the 
inner  boundary.  Since  <  f ,  we  have  verified  (3.6)  in  this  case.  In  the  second  case, 

we  assume  that  J  shares  an  edge  with  I.  Then  V  Q(fin)  <  (47  —  IF)  +  37h  =  A£(l  +  h/2). 
Since  ^1jt~_fe^2')  <  25/12  for  0  <  h  <  1/2,  (3.6)  follows  in  this  case  as  well. 

If  /  £  Ejy,  then  /  =  fnfin  with  V  a  partition  of  Q  into  rings,  then 


(3.7) 


VqU)  <  E  <  ^§N112 

v3  neV  V3 


by  the  Cauchy- Schwarz  inequality.  □ 

We  close  this  section  by  using  ideas  from  [DP]  to  prove  the  Bernstein  inequality  for  Ejy. 
If  E  is  a  finite  collection  of  dyadic  cubes,  then  for  each  I  £  E  we  define  Bj(E)  to  be  the 
set  of  all  cubes  J  that  are  maximal  in  /,  i.e.,  J  C  I,  J  £  £,  and  J  is  not  contained  in 
another  cube  with  these  properties.  It  was  shown  in  Lemma  6.1  of  [DP]  that  any  set  E 
can  be  embedded  in  a  set  E'  with  |iP|  <  A\E\  and 

(3.8)  |5j(£')|<4,  for  all  I  £  E1 . 


Theorem  3.3.  For  each  f  £  FCN,  we  have 

(3.9)  VQ(/)  <  (|iV'£ |/||il(0l. 


Proof.  If  /  £  E(y,  we  can  write  /  =  fifii,  where  E  C  F(Q)  and  \E\  <  N.  Let  E' 

be  a  set  which  contains  E,  satisfies  (3.8),  and  such  that  |iP|  <  4iV.  Then,  we  can  also 
represent  /  as 

/  =  dufir. 

I£E’ 


(3.10) 
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Figure  1 


If  I  G  E' ,  we  define  I'  :=  I  \  U{J  :  J  G  Bi{E' )}.  The  functions  cp//,  I  G  E\  have  disjoint 
supports  and 

(3.11)  /  =  ^ 

I£E’ 

with  cj  :=  jeE,  dj.  We  can  assume  that  all  ip//  appearing  in  (3.11)  are  nonzero. 

For  each  of  these  functions,  we  have  a  basic  inverse  estimate 

14 

(3-12)  Yq{lpp)  <  -j=\\lpi,\\l2. 

The  proof  of  (3.12)  is  similar  to  that  of  (3.2)  and  (3.6)  except  that  we  have  to  check  more 
cases.  The  quotient 

Vq(<P/') 
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takes  its  largest  value  for  the  configuration  in  Figure  1  which  gives  the  constant  ^=.  We 
leave  this  verification  to  the  reader. 

Using  the  Cauchy- Schwarz  inequality,  we  find 

vQ(/)<  J2  MVq(wO 

I£E’ 


4.  Approximation  by  a  constant  on  a  ring-shaped  domain. 

In  this  section,  we  shall  give  bounds  for  the  i^-error  of  approximation  of  a  BV  func¬ 
tion  by  a  constant  on  a  ring-shaped  domain.  At  first,  we  shall  make  certain  preliminary 
constructions  which  will  be  used  in  the  proofs  of  these  results  as  well  as  those  of  the  next 
section. 

Let  0  be  a  ring  contained  in  Q:  0  :=  I\  \  Jo,  lo,  I\  G  T>(Q)}  Io  C  I\-  We  shall  consider 
piecewise  constant  functions  in  Sk(Bl)-  We  assume  that  k  is  large  enough  that  2~2k  <  |i"i| 
and  2~2k  <  \Io  |  if  Io  is  not  empty.  We  can  therefore  write  |/i  |  =  m\2~2k  and  \Io  \  =  rn^2~2k 
with  mo,  mi  positive  integers  and  mo  <  mi. 

Let  Byt(O)  denote  the  external  layer  of  boundary  cubes  for  0,  i.e.  the  set  of  cubes 
I  G  Dk( IR2)  such  that  I  is  not  in  T>t t(0)  but  I  fl  0  contains  a  line  segment.  Let  (a,  b )  be 
the  lower  left  vertex  of  I\.  We  index  each  cube  I  G  T>k(Ii)  by  the  pair  of  integers  (z,  j), 
1  <  z,  j  <  mi ,  such  that  (a,  b)  -\-2~k(j  —  1/2,  i  —  1/2)  is  in  I  (we  have  purposefully  reversed  z 
and  j  in  the  indexing  so  that  z  will  now  correspond  to  a  row  and  j  to  a  column).  Boundary 
cubes  can  be  indexed  in  the  same  way  with  z,  j  now  allowed  to  take  the  values  0  and  mi  +  1. 
Note  that,  in  general,  there  are  two  types  of  boundary  cubes:  the  interior  boundary  cubes 
(which  are  contained  in  Io)  and  the  exterior  boundary  cubes  which  are  outside  of  I\.  If  I 
is  indexed  by  (z,  j),  we  say  that  I  is  in  row  i  and  column  j.  We  say  a  row  i  (respectively 
column  j)  is  unobstructed  if  all  cubes  I  G  T>k(Ii)  from  row  i  (respectively  column  j)  are  in 
Vk(Q). 

By  an  admissible  path  p  for  0,  we  shall  mean  a  piecewise  linear  path  with  the  following 
properties.  Each  segment  of  p  is  parallel  to  a  coordinate  axis  and  connects  a  center  of  a 
cube  I  G  T>k(H)  U  Bt t(0)  to  the  center  of  another  cube  J  G  T>k(H)  U  £U(fi).  Each  edge 
L  G  Ck{Bt  U  dll)  is  transversed  at  most  once  by  p  and  each  edge  not  in  this  set  is  never 
transversed  by  p. 

For  each  i  =  1, .  .  .  ,  mi,  there  are  either  two  or  four  boundary  cubes  in  Bk(Il)  which  are 
in  row  i.  For  each  distinct  pair  of  these  cubes  (/,  J),  we  shall  construct  an  admissible  path 
Pi(I}  J)  which  connects  I  to  J  as  follows. 

If  there  are  exactly  two  such  boundary  cubes  for  row  z,  we  take  the  strictly  horizontal 
path  which  connects  the  center  of  I  to  the  center  of  J. 

Consider  next  the  case  where  there  are  four  boundary  cubes  in  row  z.  The  indices  of 
these  cubes  are  (z,j),  j  =  jo,  ji, J2, J3,  where  jo  =  0  <  ji  <  j 2  <  jo  =  mi  +  1.  Moreover, 
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j i  >  mo  and  j 3  —  j 2  >  mo-  Let  /  and  J  be  two  of  these  boundary  cubes  with  indices  (z,  j) 
and  ( z ,  j ' )  and  j  <  j1 .  If  j  =  jo  and  j1  =  j  1,  we  take  the  path  pi(I}  J )  to  again  be  the 
strictly  horizontal  path  connecting  the  center  of  I  to  the  center  of  J .  We  proceed  similarly 
if  j  =  j 2  and  j'  =  j3. 

We  now  consider  the  remaining  cases.  Let  j(i)  £  [l,mo]  be  congruent  to  z  mod  mo- 
Then,  the  column  with  index  j(i)  is  unobstructed.  Similarly,  the  column  with  index 
j'{i)  :=  mi  —  j(i)  +  1  is  unobstructed.  Also,  for  one  of  the  two  choices  i\  :=  z  ±  mo,  the 
row  with  index  i\  is  unobstructed. 

If  /,  J  are  a  pair  for  which  we  have  not  yet  constructed  pi(I}  J),  then  we  construct 
this  path  as  the  concatenation  of  the  the  five  segments  which  connect  the  centers  of  the 
cubes  with  the  following  indices  in  the  specified  order:  ( z , j ) ,  ( z , j ( z ) ) ,  (zi,j(z)),  (A,  j'(z)), 
(z,  j'(z)),  It  follows  that  pi(I}  J )  is  an  admissible  path. 

We  shall  need  one  last  type  of  row  path  that  occurs  only  in  the  case  that  row  z  is 
obstructed  but  there  are  only  two  boundary  cubes.  This  case  occurs  when  Io  touches  the 
boundary  of  I\ .  Let  I  be  the  boundary  cube  in  row  z  which  touches  the  boundary  of  I\ . 
We  assume  that  I  has  index  (z ,  0)  (the  case  when  I  has  index  (j,mi  +  1)  is  handled  in  a 
symmetric  manner).  We  let  j(i)  and  i\  be  as  above.  We  let  p(I)  be  the  admissible  path 
which  consists  of  the  three  segments  which  connect  the  centers  of  the  cubes  with  indices 
(z,  0),  (z,j  (z)),  (zi ,  j (z))  and  (i\  ,  mi  +  1)  in  that  order. 

We  make  the  analogous  construction  of  paths  which  connect  the  boundary  cubes  in 
column  j  and  denote  these  paths  by  7 j(/,  J). 

We  shall  now  use  these  paths  to  prove  the  error  estimate  (1.17)  for  rings.  Before 
proceeding  to  the  proof  of  (1.17),  we  remark  that  this  inequality  holds  for  general  Lipschitz 
domains  Q.  Indeed,  using  the  known  embedding  of  BV(Q)  into  L2(£l):  we  have 

(4-1)  ||/  -  o||l2(o)  <  C\\f  -  o||bv(0), 

for  any  function  /  and  constant  a.  Therefore,  taking  the  infimum  over  a,  we  obtain 
(4-2)  ||/  -  an(/) ||l2(o)  <  C  infj|/  -  a||Bv(n)  <  C'i|/|Bv(n)  =  Ci  vp(/)- 

The  last  inequality  in  (4.2)  follows  for  example  from  elementary  results  in  approximation 
(see  e.g.  Theorem  3.5  in  [DS]).  It  is  to  see  that  the  constant  C\  is  invariant  by  isotropic 
scaling  of  Q,  but  grows  by  anisotropic  (e.g.  one  directional)  scaling.  This  reveals  that  C\ 
strongly  depends  on  the  shape  of  Q.  Our  goal  is  to  directly  prove  (1.17)  with  a  constant 
C\  that  is  uniform  for  rings  Q  =  I\  \  Io- 

Let  S  £  <Sfc(0)  be  a  piecewise  constant  function  on  ft  with  k  such  that  2~2k  is  less  than 
|/i  |  and  2"  is  less  than  \Io  \  in  the  case  where  Io  is  not  empty.  Given  a  path  p,  let 

(4.3)  J(p)  :=  |  JL|, 

L 

where  the  sum  is  taken  over  all  edges  L  £  Ck{H°)  which  are  crossed  by  p.  Here  and  later, 
we  use  the  notation  K°  to  denote  the  interior  of  a  set  K  C  IR2. 
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For  each  z,  we  define 

(4-4)  ri:=^2J(pi), 

pi 

where  the  sum  is  taken  over  all  the  paths  pi  associated  to  the  row  index  z  (recall  there  are 
one  or  six  such  paths)  and 

m\ 

(4.5)  R  := 

i= 1 

Similarly,  we  define 

(4.6)  Cj:=^J(7j), 

7 j 

where  the  sum  is  taken  over  all  the  paths  7j-  associated  to  the  column  index  j  and 

m\ 

(4.7)  C  :=  E  cr 

j=  1 

Lemma  4.1.  For  any  ring  fl  and  any  S  £  <S/t(fi),  we  have 

(4.8)  2-k(R  +  C)<9Vn(f). 


Proof.  We  shall  first  estimate  how  often  |Jl|,  with  L  a  fixed  vertical  edge,  L  £  £*,(0°), 
appears  in  the  sum  R  +  C .  Suppose  first  that  L  is  in  an  unobstructed  row  i.  Then  L 
appears  exactly  once  for  paths  pi.  The  row  i  is  used  at  most  four  times  for  paths  piy  with 
i  ^  i' .  The  row  i  is  also  used  at  most  four  times  for  paths  7j.  Hence  Jj,  appears  at  most 
9  times  in  the  sum  R-\-C .  Consider  next  the  case  when  i  is  obstructed.  Then,  Jl  appears 
exactly  once  for  paths  pi  and  it  never  appears  for  any  other  paths  pr  or  7j.  The  same 
estimate  holds  for  Jj,  when  L  is  a  horizontal  edge.  Thus, 

(4.9)  2~k(R+C)<9  2~k\JL\  =  9Vh(/), 

Leck(n°) 

where  the  last  equality  is  given  by  (2.12).  □ 

Remark  4.1  In  the  case  0  is  a  cube,  the  constant  9  in  (4.8)  can  be  replaced  by  1. 
Theorem  4.1.  For  any  ring  Q  =  R  \  R  and  any  function  f  £  BV(fi),  we  have 


(4.10) 


11/ - on(/)in2,o)  <6v^l'n(/) 
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Proof.  Let  us  first  observe  that  it  is  sufficient  to  prove  this  estimates  for  the  special  case 
of  functions  S  £  <S/t(fi).  Indeed,  if  this  has  been  shown,  then  we  have 

(4.11)  11/  -  aa(/)IL,(a)  <  11/  -  Pk{f  i\\L,m  +  I lft(/)  -  «a(/)IA(a), 

where  Pt t  is  the  projector  onto  St t(0).  The  first  term  tends  to  zero  with  k  and  the  second 
would  provide  our  estimate  since  ao(-P)t(/))  =  ao(/)  and  since  by  (2.11)  V n(Pk(f))  < 
Vq(/)  if  k  is  sufficiently  large. 

Henceforth,  we  consider  /  £  <S/t,  with  k  such  that  2~2k  is  less  than  \Ii\  and  2~2k  is  less 
than  |  Jo  |  in  the  case  where  Jo  is  not  empty.  Let  pj  =  pij  denote  the  value  of  /  on  the 
cube  I  with  I  in  row  i  and  column  j.  (with  similar  notation  for  P)  and  let  A  denote  the 
set  of  (z,  j)  such  that  the  cube  with  index  (z,  j)  is  contained  in  0  and  let  N  :=  |A|.  Then, 
A  :=  an(f)  =  jf  Therefore, 

(4.12)  \pij  -  A\  <  iV_1  ^  \Pi,j  ~  Pi', j'\- 

(*',/')£  A 

We  can  construct  an  admissible  path  p  which  connects  the  center  of  I  to  the  center  of 
P  using  portions  of  the  paths  pi  and  7 7 .  Indeed,  it  is  easy  to  see  from  our  constructions 
there  is  a  path  pi  associated  to  row  z  which  passes  through  I  and  a  path  7 j  associated 
to  column  j  which  passes  through  j  such  that  pi  intersects  7 j.  We  take  p  as  the  shortest 
path  contained  in  pi  U  7 j  which  connects  the  center  of  I  to  the  center  of  J.  It  follows  that 
\pi,j  ~  Pi' ,j'  |  does  not  exceed  the  sum  of  the  Jl  crossed  by  this  path.  Hence, 

(4.13)  \pij  -  pipji  |  <  rt  +  Cji. 

By  a  symmetric  argument,  we  obtain  that 


(4.14) 

By  (4.13)  we  obtain 


Pi'  ij'  I  —  ^ i'  Cj 


(4.15) 

and  by  (4.14) 

(4.16) 


\pi,j  -  A\  <  N  1  ^  (r;+Cj/)<r;  + 

(i',j')e  A 


miC 

N  ’ 


\pi,j  —  A\  <  Cj  + 


m\R 


Hence 


,  1 9  .  m\C..  miR .  m\  „  m\  ^  m? 

I Pi,j  ~  A\  <  (rt  H  4  jy- )  =  r'cJ  +  ~Nr'lR  +  + 


We  note  that  N2  2k  =  |fi|  >  ||Ji|  =  \m\2  2k .  In  other  words,  m\  <  | N.  Therefore, 
summing  over  z,  j  we  obtain 


\S~A\\ 


l2(p) 


2-2*  E  \phJ-A\2<2-2k(RC+^R2  +  ^C2  +  ^RC) 


<  ^2 ~2k(R  +  C)2  <  ^9 2Vo(/)2, 


where  we  have  used  Lemma  4.1.  This  proves  (4.10).  □ 

Remark  4.2  In  the  case  O  is  a  cube,  the  constant  6\/3  in  (4.10)  can  be  replaced  by  1. 
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5.  Projections  onto  piecewise  constant  functions. 

In  this  section,  we  shall  prove  the  BV  stability  of  projections  onto  a  space  of  piecewise 
constant  functions  related  to  a  partition  of  Q  into  rings. 

We  denote  by  V  a  partition  of  Q  into  a  finite  number  of  rings.  This  means  that  the 
elements  of  V  are  rings  K  which  are  pairwise  disjoint  and  union  to  Q.  For  each  such 
partition  V,  we  define 

(5-1)  PT(f )  :=  ^  aic(f)<PK, 

K£V 

where  we  recall  that  ajf(/)  is  the  average  of  /  over  K  and  is  the  characteristic  function 
of  K. 

Theorem  5.1.  For  any  finite  partition  V  of  Q  into  rings  and  any  f  £  BV(Q),  we  have 
(5-2)  Vg(iM/))  <10VQ(/). 

Proof.  Let  k  be  large  enough  so  that  for  any  K  £  V,  K  =  I\  \  Jo,  we  have  |/i  |  >  2~2k  and 
| Jo |  >  2~2k  if  Jo  is  not  empty.  Then  P-p(f)  =  P-p{Pk{f))-  Thus,  in  view  of  (2.11),  it  is 
enough  to  show  that  (5.2)  holds  for  any  /  £  Sk-  We  consider  only  such  /  in  the  remainder 
of  this  proof. 

If  L  £  Ck{Q),  we  denote  by  .Jl  :=  Jl{/)  the  jump  in  /  across  L  and  by  J l{Pv{ f ))  the 
jump  in  Pv(f)  across  L.  For  any  set  R  C  Q}  we  define 

(5.3)  S(/,JJ):=  J]  |Ji|. 

L£Ck  (R) 

Fix  one  set  K  from  V  and  let  /o  be  obtained  from  /  by  redefining  /  to  be  «ic(/)  on  K. 
Note  that  the  jumps  in  /o  are  the  same  as  those  of  /  except  for  those  inside  K  (which  will 
be  0  in  /o)  and  those  on  dK}  the  boundary  of  K.  We  shall  prove  that 

(5.4)  E(/0 ,  Q)  <  £(/,  Q)  +  9 £(/,  K  \  dK). 

Assume  for  the  moment,  we  have  proven  (5.4).  Then,  repeating  successively  for  each 
K  £  V  the  process  that  constructs  /o  from  /,  we  arrive  at 

(5.5)  £(iM/),  Q)  <  £(/,  Q)+  9  £(/,  K  \  dK)  <  (1  +  9 )£(/,  Q). 

K£V 

Since  Vq(/)  =  2 ~kF(f,Q),  (5.5)  implies  (5.2). 

We  finish  the  proof  by  proving  (5.4).  We  shall  use  the  paths  that  were  constructed  in 
§4.  We  fix  a  ring  K  £  V  and  we  index  the  cubes  I  £  Pk{K)  U  Bk{K)  as  in  §4.  Let  pj  =  pij 
denote  the  value  of  /  on  I  when  I  has  index  ( z ,  j ) .  Let  .J'L  :=  Jl(/o)  be  the  jump  in  /o 
across  L  £  Ck{Q0)-  We  need  to  estimate  .J'L  for  those  L  contained  in  the  boundary  of  K. 
To  each  such  L,  there  is  an  I  =  I(L)  £  Bk{K)  which  contains  L  as  one  of  its  sides. 
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We  let  (z,  j)  denote  the  index  of  I.  Then,  we  have 


(5.6) 


ui<^f  £  i p.., 

(*',J')G  A 


Pi'  ,j! 


5 


where  as  before  A  denotes  the  set  of  (z,  j)  such  that  the  cube  with  index  (z,  j)  is  contained 
in  K7  and  N  =  |A|.  Let  I'  have  index  (i' ,j').  As  in  the  proof  of  Theorem  4.1,  using  a 
subpath  of  one  of  the  pi  and  a  subpath  of  one  of  the  7 7  (in  the  case  1  <  z  <  mi)  or  from 
Pi 1  and  7 j  (in  the  case  1  <  j  <  mi),  we  can  construct  an  admissible  path  p(z,  j,  z',  j1)  for 
K  which  connects  the  center  of  I  to  the  center  of  I'.  Let  T(z,  j,z',  j')  denote  the  collection 
of  all  of  the  M  £  Ck{Q)  which  intersect  this  path.  Then, 

(5.7)  IA|W  £  £  \Jm\. 

Thus, 

(5.8)  ^  \J'l\  -  ^2  nM\JM\, 

LedK  MeCk(Q°) 


where  um  is  the  total  number  of  times  M  appears  in  all  of  the  sets  T(z,  j,  z',  j'),  with  ( z ,  j ) 
the  index  of  a  cube  in  Bk(K)  and  (i' ,  j')  the  index  of  a  cube  in  T>k(K).  We  shall  complete 
the  proof  by  showing  that 

(i)  um  =  0,  if  M  is  not  contained  in  Lk(K)  U  Lk(dK), 

(ii)  /m/  -V.  if  A/  £k(dK), 

(iii)  nM  <  9 N,  if  M  £  £k(K°). 

Clearly,  these  three  estimates  used  in  (5.8)  prove  (5.4). 

Now,  statement  (i)  is  obvious  because  all  the  paths  p(i}j}i'}j')  are  admissible  for  K. 
Statement  (ii)  is  also  obvious  because  Jm,  M  £  Ck{dK)  is  crossed  only  by  the  paths  that 
emanate  from  I(M )  and  there  are  exactly  N  of  these  (one  for  each  cube  I'  in  Dk(K)). 
To  prove  (iii),  consider  for  example  a  vertical  segment  M  £  Ck(K  \  dK).  If  M  is  in  an 
obstructed  row,  then  for  each  M  will  appear  in  exactly  one  T(z,  j,  d,  j1);  namely 

for  one  pair  (*,  j)  with  i  the  index  of  the  row  which  contains  M.  So  for  these  M,  we 
have  um  =  N.  On  the  other  hand,  if  M  is  in  an  unobstructed  row  z*,  then  for  each 
(i' ,jr),  M  will  appear  in  only  one  of  the  T(z*,  j,  d,  j')  for  the  two  values  of  j  corresponding 
to  boundary  cubes.  At  the  same  time,  M  can  appear  at  most  four  times  in  the  sets 
T(z,  j,  d,  j'),  1  <  i  <  mi,  i  z*;  namely  for  the  one  possible  obstructed  row  with  index  z 
which  is  congruent  to  i*  mod  m o-  Similarly,  for  each  (z',  j'),  M  can  appear  at  most  four 
times  in  the  sets  T(z,  j,  z',  j'),  1  <  j  <  mi.  Thus  um  £  9 N  in  this  case.  We  have  proved 
(i-iii)  and  completed  the  proof  of  the  theorem.  □ 

6.  A  partition  algorithm  and  a  direct  estimate  for  ££. 

In  this  section,  we  shall  prove  the  direct  estimate  (1.7)  for  £(y.  Our  proof  is  based  on 
two  ingredients: 

(i)  The  projection  error  inequality  (1.17)  for  ring-shaped  domains  that  was  established  in 
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§4- 

(ii)  A  general  result  on  the  partitioning  of  Q  into  rings  with  respect  to  a  super-additive 
function. 

The  proof  of  this  second  result  will  actually  provide  a  concrete  algorithmic  procedure 
that  builds  adaptive  partitions  of  Q  into  rings  for  the  approximation  of  a  given  function 

/• 

If  /  G  L2(Q)i  we  define 

(6-1)  °N(f)  '■=  inf  ||/  —  flf||i2(Q) 

which  is  the  error  of  approximation  by  the  elements  of 

In  the  following,  we  let  $  denote  a  positive  set  function  defined  on  the  algebra  A(Q) 
generated  by  the  rings  K  C  Q.  That  is,  A(Q)  consists  of  all  subsets  of  Q  which  can  be 
formed  by  finite  unions  and  intersections  of  rings  K  C  Q  and  their  complements.  We  make 
the  following  assumptions  on  $: 

(i)  $  is  super-additive:  if  K\  and  K2  are  disjoint  sets  in  A(Q),  we  have 
(6.2a)  $(Ab)  +  $(Ab)  <  $(Ab  U  I<2). 


(ii)  $  applied  to  cubes  of  decreasing  size  goes  uniformly  to  zero,  i.e. 

(6.2b)  lim  sup  $(AT)  =  0. 

k^°°Kevk(Q) 

Note  that  an  immediate  consequence  of  (6.2a)  is  that  $(Ad)  <  $(Ab)  when  K\  C  A^. 
We  shall  prove  a  general  partitioning  result  with  respect  to  such  functions.  In  practice, 
we  shall  be  interested  in  applying  this  result  in  the  case  where 

(6.3)  *(A')  =  <f/(A')  =  ||/  -  aK{J)fLAK), 

for  /  G  L2(Q)}  and  also  in  the  case  where 

(6-4)  $(K)  =  VK(f)  =  \f\By(K), 

for  /  G  BV(<5).  It  is  easy  to  see  that  properties  (i)  and  (ii)  are  satisfied  in  both  of  these 
cases  (see  [Z]  for  a  proof  of  (ii)  for  the  second  example  using  a  slight  modification  of  the 
BV  norm). 

We  next  make  some  preliminary  remarks  which  will  be  useful  for  stating  and  proving 
our  main  result  (Theorem  6.1)  of  this  section.  Recall  that  each  dyadic  cube  I  has  four 
children  J;  these  are  the  dyadic  cubes  .J  C  I  with  \.J\  =  |/|/4  and  one  parent.  Given  a 
function  $  as  above  and  a  parameter  e  >  0,  we  define  7)  to  be  the  set  of  cubes  I  G 
such  that  $(/)  >  e.  The  collection  of  cubes  in  7)  form  a  tree  which  means  that  whenever 
I  G  7)  and  I  ^  Q}  then  its  parent  also  belongs  to  7).  We  also  remark  that  7)  has  finite 
cardinality,  due  to  (6.2b). 
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In  what  follows,  we  shall  assume  that  $(Q)  ^  0  and  that  e  is  small  enough  so  that  7) 
is  not  empty.  In  the  tree  7),  we  shall  make  the  distinction  between  several  types  of  cubes: 

(i)  The  set  of  final  cubes  fFe  consists  of  the  elements  7  G  Tf  with  no  child  in  7). 

(ii)  The  set  Afe  of  branching  cubes  consists  of  the  elements  7  G  Tf  with  more  than  one  child 
in  7). 

(iii)  The  set  Cf  of  chaining  cubes  consists  of  the  elements  7  G  Tf  with  exactly  one  child  in 

Tt. 

^From  the  fact  that  a  branching  cube  always  contains  at  least  two  final  cubes,  one  easily 
derives 

(6.5)  \Afe\  <  \Fe\~  1- 

The  set  Cf  can  be  partitioned  into  maximal  chains  Cq.  That  is,  Cf  =  U”=1Cg,  where 
each  Cq  is  a  sequence  of  m  =  m(q)  embedded  cubes: 

(6.6)  Cq  =  (Jo,  •  •  •  , 7m_i), 

where  Ik+i  is  a  child  of  /&,  and  where  7o  (resp.  7m_i)  is  not  a  child  (resp.  parent)  of  a 
chaining  cube. 

The  last  cube  Im-i  of  a  chain  Cq,  always  contains  exactly  one  cube  Im  from  %  and  this 
cube  is  either  a  final  cube  or  branching  cube.  The  cube  Im  is  uniquely  associated  to  this 
chain.  This  shows  that  the  number  of  chains  n  =  n(e)  satisfies 

(6.7)  n  <  \J\f€\  +  \Fe\  —  1  <  2\Fe\ -  1. 

Our  next  theorem  gives  our  main  result  of  this  section.  It  algorithmically  constructs 
a  partition  V€  of  Q  into  rings  K  with  $(J7)  <  e.  It  also  describes  a  second  partition  Vf_ 
whose  sole  purpose  is  to  help  count  the  number  of  rings  in  V€. 

Theorem  6.1.  Let  e  >  0  be  such  that  7)  ^  0.  Then,  there  exist  a  partition  Ve  of  Q  into 
disjoint  rings  such  that 

(6.8)  <  e  if  K  e  Ve, 

and  a  set  Ve  =  V]  U  Vj:  of  pairwise  disjoint  sets  K  which  are  cubes  (in  the  case  K  £  ) 

or  rings  (in  the  case  K  £  V( )  such  that 

(6.9)  >  e,  if  K  £  Ve, 
and 

(6.10)  \V€\  <  8\V(\  +  3\V(\ <  8\Ve\. 

Proof.  We  define  V€  =  V]  U  U^3,  with 

(i)  V(:  all  children  J  of  the  final  cubes  I  £  fFe. 

(ii)  V(:  the  children  J  of  the  branching  cubes  I  £  Afe,  such  that  J  fz  7). 
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(iii)  V j:  rings  and  cubes  obtained  from  the  chains  of  7)  by  an  algorithm  that  we  now 
describe. 

If  Cq  =  (Jo, .  .  .  is  a  maximal  chain  (1  <  q  <  n),  and  Im  is  as  above,  then  we 

associate  a  chain  ring  Kq  =  I0  \  Im  to  each  chain  Cq.  Note  that 

(6.11)  :  q  =  \,...,n) 

is  a  partition  of  the  cube  Q.  We  next  partition  each  chain  ring  Kq,  q  =  1, .  .  .  ,  n,  according 
to 

(6.12)  k,  =  n„  \  in )  u  nn  \  in i  u  ■  ■  •  u  \  4 ), 

where  0  =  jo  <  ji  <  •  •  •  <  jp  =  nr  (p  =  p(q))  are  uniquely  defined  by  the  following 

recursion  algorithm:  assuming  that  jk  is  defined,  and  that  jk  <  nr,  we  choose  jk+i  as 
follows: 

(i) if  *(4\im)  <  e,  then  jk+i  ■=  m,  he.  p  :=  k  +  1  and  the  algorithm  terminates. 

(ii)  if  $(Ijk  \  Ijk+1)  >  e,  then  jk+1  :=  Jk  +  1. 

(iii)  if  neither  (i)  or  (ii)  apply,  then  jk+i  is  chosen  such  that  &(Ijk  \  Ijk+1)  <  e  and 
Wm  \  I,  /ife+i+i)  >  € ■  In  other  words,  jk+i  is  the  largest  j  >  jk  such  that  &(Ijk  \  Ij)  N  e. 

We  can  now  define  the  set  .  For  each  chain  ring  Kq,  q  =  1 ,n,  we  include  in  Vk: 

(i)  all  rings  Ijk  \  Ijk+1  such  that  $(Ijk  \  Ijk+1)  <  e,  (ii)  the  children  of  Ijk  (J}k,J]k,Jjk) 

that  differ  from  Ijk+17  for  all  k  such  that  &(Ijk  \  Ijk+1)  >  e  (in  this  case  jk+i  =  jk  +  1,  i.e. 
Ijk+1  is  a  child  of  Ijk).  Note  that  the  cubes  (Jjk,  Jjk,  Jfk)  are  not  in  7). 

Because  of  (6.11),  V€  is  a  partition  which  clearly  satisfies  (6.8). 

Next,  we  define  V€  ■=  V]  U  ,  where 

(i)  V]  is  the  set  of  all  of  the  final  cubes  of  7). 

(ii)  Vj  is  a  set  of  rings  constructed  by  an  algorithm  that  we  now  describe. 

For  each  chain  ring  Kq,  q  =  1, .  . .  ,n,  we  recall  its  decomposition  according  to  Kq  = 
( Ij0  \  Ijx)  U  •  •  •  U  {Ij  1  \  7j  ),  and  we  construct  a  new  decomposition 

(6.13)  A-,  =  (4 (4)11(4  \ 4) U...U \4). 

where  0  =  so  <  si  <  •  •  •  <  sr  =  m  (r  =  r(q))  constitute  a  subset  of  (jo,  •  •  •  ,  jp)  uniquely 
defined  by  the  following  recursion  algorithm:  assuming  Sk  =  ji  <  m  is  defined, 

(i)  if  j;_|_i  =  m,  we  take  sk+i  :=  m  and  r  :=  k  +  1  and  terminate  the  algorithm. 

(ii  )  ^  ji+i  <  m,  and  if  $(7j;  \  Ijt+1)  <  e,  we  take  sk+i  =  ji+2-  In  the  case  that  jb+2  =  nr, 
we  terminate  the  algorithm. 

(iii)  if  ji+ 1  <  m,  and  if  <f>(/J;  \  IJl+1)  >  e,  we  take  sk+ i  =  ji+i- 

For  each  chain  ring  Kq,  q  =  1  ,...,n,  we  then  include  in  Vj  the  rings  ISk  \  ISk+1 , 
k  =  0,  •  •  •  ,  r  —  2,  for  which  we  have  $(/Sjfe  \  ISk+1)  >  e  (by  the  construction  of  Vjj)  and  we 
also  include  the  last  ring  ISr_1  \  ISr  only  if  it  satisfies  $(/St  This  means  that 

we  do  not  include  any  ring  from  the  chain  ring  Kq  if  <&(Kq)  <  e. 

We  now  claim  that 

\vl  \  <  3\vk\  +  n<  3\Vk  |  +  2|7re|  -  1  =  3\V*\  +  2\V}\  -  1, 


(6.14) 
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Indeed,  each  ring  ISk  \  ISk+1  of  "P2  contains  (as  subsets)  at  most  three  rings  of  Ve  and  in 
each  chain  Cq,  q  =  1, .  .  .  ,  n,  at  most  one  ring  of  V\  is  not  contained  in  some  element  of  "P2. 
Finally,  we  prove  the  estimate  (6.10).  First,  we  clearly  have 

(6.15)  \V]\  <  A\P]\ 

and 

(6.16)  |U2|  <  2|A/|  <  2(|jFe|  -  1)  =  2{\V]\ -  1). 

Using  these  last  two  estimates  with  (6.14),  we  obtain 

(6.17)  \V€\  <  3|Ue2|  +  8(^1  -  3  <  8\V]  |  +  3|U2 1  <  8\Ve\. 

This  proves  (6.10)  and  completes  the  proof  of  the  theorem.  □ 

We  shall  now  use  Theorem  4.1  to  prove  a  direct  estimate  for  approximation  by  the 
elements  of  E().  To  do  so,  we  fix  /  £  ^(Q )  which  is  not  constant  and  we  take  for  $  the 
^2-error  function  defined  by  (6.3).  For  each  e  >  0,  the  algorithm  described  in  the  proof  of 
Theorem  6.1  gives  a  partition  V€  =  P*//)  adapted  to  /.  We  then  consider  the  piecewise 
constant  approximation 

(6.18)  AJ:=PVJ , 
where  P-p e  is  defined  by  (5.1). 

Theorem  6.2.  If  f  £  BV(Q)  is  not  constant  and  if  e  >  0,  then  the  algorithm  of  Theorem 

6.1,  with  $  given  by  (6.3),  produces  a  partition  Pe  that  satisfies 

(6.19)  \Ve\  <^Vq(/),  M:=  18V3, 

V  e 

and  an  approximation  Aefthat  satisfies 

(6.20)  II/--4,/||!i(qi  <M^IVqU). 

Consequently,  one  has  the  Jackson  estimate 

(6.21)  *rN(f)  <  MN-^VqV). 

Proof.  We  consider  the  set  Pf  with  the  properties  indicated  in  the  statement  of  Theorem 

6.1.  Using  the  error  estimate  (4.10)  with  constant  6\/3  for  rings  and  1  for  cubes  (see 
Remark  4.2)  together  with  (6.10)  we  obtain 

GT.I<G[8|Al  +  3lAl] 

<8  Y,  [*(A')]1/2+3  Y  WA')]1/2 

k  eU  k eU 

<  8  V*U)  +  18^3  5]  VK(f  ) 

Kevi  Kev? 

<  18  >/3  E  Va(/)  <  18^3 VQ(f  ). 

Kef, 
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Dividing  by  y/e,  we  obtain  (6.19). 

The  approximation  error  (6.20),  is  then  obtained  from 

\\f-A,niAQ]=  y,  *<*■)<  \n*, 

K£V e 

and  (6.19).  If  we  take  y/e  :=  ;  then  (6.19)  and  (6.20)  imply  (6.21).  □ 

We  can  also  obtain  (6.21)  by  using  the  function  $(A")  =  V/c(/).  We  now  denote  by 
V€(f)  the  resulting  partition  and  A*f:=  P-pgpf  the  resulting  partition  when  the  tolerance 
is  chosen  as  e. 

Theorem  6.3.  If  f  £  B V(Q),  VQ(/)  ±  0,  N  >  0  and  e  :=  8 iV1  VQ(/),  then  the 
algorithm  of  Theorem  6.1,  with  $  given  by  (6.4),  produces  a  partition  Pe  that  satisfies 

(6.22)  |W|  <iV 

and  an  approximation  A*f  that  satisfies 

(6.23)  11/  -  A;/||Ii(q1  <  48v^.V-1/2  VQ(/). 


Proof.  The  proof  is  similar  to  the  previous  theorem.  We  consider  the  sets  Pf  and  Pf  of 
Theorem  6.1.  Using  (6.9)  and  (6.10),  we  have 

e\Ve\  <  Se\Ve\  <  8  $(J0  =  8  VK(f)  <  8 VQ(/), 

Keve  Kef e 


which  gives  (6.22). 

We  use  the  error  estimate  (4.10)  and  (6.22)  to  obtain 

ll/-2t;/llLg>=  E  ll/-<w(/>ll!,ao  V  vA-(/)2 

Keve  Kevf_ 

<  (6V3)2|W|e2  <  (48 VsfN-1  Vq(/)2, 


which  proves  (6.23)  □ 

We  close  this  section  with  the  following  simple  remark  about  existence  of  best  approxi- 
mants  from  Trn. 

Lemma  6.1.  For  each  f  £  ^(Q )  and  N  >  0  there  exists  g  £  E(y  such  that 

\\f  -9\\l2(q)  =  <rrN(f). 

Proof.  By  the  definition  of  arN (/)  (see  (6.1)),  there  exist  gi,g2,  ■  ■  ■  such  that  gj  £  E(y  and 

11/  “  9j\\l2(Q)  <  aN(f)  +  j  1- 
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Let  Vj  (\Vj\  =  N)  be  the  partition  for  gj  and  furthermore  let  K3m  =  I3n\  J ^  £  Vj,  m  = 
1,  2, .  .  .  ,  IV,  be  the  rings  of  Vj  with  the  indices  selected  such  that  \K[  |  >  \K^  !>•••>  |/7/r  |. 

By  selecting  a  subsequence  from  {gj),  we  can  find  an  rj  >  0  and  an  No  <  N  such  that 
\K3m  \  >  rj,  1  <  m  <  No,  j  =  1,2,...,  and  |AAj  — s-  0,  j  — >  oo,  iVo  <  m  <  N .  It  follows  that 
for  each  m,  either  the  |/^|  >  rj  for  all  j  or  \I3m\  — >  0,  j  — >  oo.  A  similar  statement  applies 
to  the  Since  there  are  only  a  finite  number  of  dyadic  cubes  with  measure  >  77 ,  by 

again  extracting  a  subsequence,  we  can  assume  that  for  each  m,  either  /^  does  not  change 
with  j  or  \I3m  \  — >  0.  A  similar  statement  applies  to  the 

It  follows  that  there  exist  disjoint  rings  K^,  m  =  1, .  .  .  ,  N,  such  that  \K3  \  K*\  +  \K*  \ 
Kj\  — >  0,  j  — >  00  and  =  0,  No  <  m  <  N .  It  is  now  easy  to  see  that  \\g  —  gj\\L2(Q)  0, 

j  — >  00,  for 

No 

g  ^  aK*m(f>K*m- 

771=1 

Therefore,  g  satisfies  the  conclusions  of  the  theorem.  □ 

7.  Minimization  of  the  //-functional  by  piecewise  constant  approximation. 

In  this  section,  we  shall  use  the  Jackson  and  Bernstein  estimates  that  we  have  proved 
for  Sjy  to  show  that  a  near  minimizer  for  the  problem  (1.2),  i.e.  the  //-functional,  can  be 
taken  from  some  space  Sjy.  We  shall  also  show  how  the  algorithm  of  the  previous  section 
can  be  used  to  find  a  near  minimizer. 

We  begin  with  the  following  simple  result. 

Theorem  7.1.  For  each  f  £  /^(Q )  a nd  N  >  0,  and  for  each  6  >  0,  there  exists  a  function 
h  £  FrN  such  that 

(7.1)  ||/  -  h\\L2(Q)  +  N~^2VQ{h)  <  (1  +  6)18V3K{f,N~^2). 

Proof.  If  //(/,  IV-1/2)  =  0  then  /  is  constant  and  (7.1)  follows  by  taking  h  =  /.  If 
//(/,  IV-1/2 )  0  and  8  >  0,  let  g  £  BV(Q)  satisfy 

(7-2)  II/-9||li(W  +  W1/2G(9)  <(l  +  6)K(f,.\’-'S). 

Then,  according  to  (6.21)  of  Theorem  6.2,  for  each  N ,  there  exists  a  function  gxr  £  FrN 
such  that 

(7.3)  \\9-9n\\l,  <  18 VSN-^VQig). 

We  take  h  :=  gxr  so  that 

11/  ~  h\ \l2(q)  <  11/  -  9\\l2(q)  +  II g  -  ^IU2(q) 

<  11/  —  9\\l2(Q)  +  18v/3iV-1,/2  N Q{g) 

<  18^3(1  +  /)//(/,  N~1/2  ). 


(7.4) 
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We  can  estimate  the  variation  of  h  by  Theorem  5.1.  Since  h  =  P-pg  with  V  the  partition 
for  h,  this  gives 

(7.5)  VQ(h)  <  10 VQ(g)  <  10(1  +  S)N1/2K(f}N~1/2). 

Then,  (7.4)  together  with  (7.5)  proves  the  theorem.  □ 

We  say  that  an  element  g  £  YirM  is  a  near  best  approximation  to  /  £  ^(C)  (with 
parameters  a  >  1,  and  N  <  M)  if 

(7-6)  ||/ -  9\\l2(q)  <  avrN(f  )- 

We  next  show  that  any  such  near  best  approximation  is  a  near  minimizer  for  (1.2). 

Corollary  7.1.  If  /  £  ^(C )  and  g  £  E)y  is  a  near  best  approximation  with  parameter 
a,  then  g  satisfies 

(7.7)  ll/-9lll,(0)  +N-ie-vQ(g)  <  Oiif/, JV“1/2), 

with  C0  <  2016  +  18a/3. 

Proof.  Let  h  £  YhrN  be  the  function  of  Theorem  7.1.  Then, 

(7-8)  11/  -  9\\l2(q)  <  aarN{f)  <  a\\f  -  h\\L2(Q). 

Also,  since  g  —  h  £  from  the  Bernstein  estimate  (3.9),  we  conclude  that 

N-ll2VQig)  <  jV-VH-QlM  +  N-OWQ(g  -  h)  <  N-''2vQ(h)  +  -=||9  -  h\\LMt 

56 

<  N-'SVQ{h)  +  -^(11/  -  9||Il(w  +  ||/  -  ft||L!(Q1) 

56 

Combining  this  with  (7.8)  gives  that  the  left  side  of  (7.7)  does  not  exceed 

56  56 

(a+y| (l  +  a))ll/-^IU2(Q) 1/2CQ(h)  <  (a+ —^(l  +  a))(\\f -h\\L2(Q)+N  1I2Vq(}i)). 

We  now  use  (7.2)  to  arrive  at  (7.7).  □ 

While  Theorem  7.1  and  Corollary  7.1  both  provide  near  minimizers  of  (1.2)  they  are  not 
of  practical  interest  since  they  are  not  constructive.  Yet,  they  show  that  a  near  minimizer 
for  (1.2)  can  be  taken  from  HrN  when  N  is  chosen  so  that  IV-1/2  has  the  same  order  of 
magnitude  as  t. 

We  shall  use  the  remainder  of  this  section  to  prove  that  a  near  minimizer  can  also  be 
obtained  by  applying  the  algorithm  of  the  previous  section  to  the  function  /.  Recall  that 
this  algorithm  is  controlled  by  the  parameter  e  >  0:  by  decreasing  e,  we  increase  the  number 
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of  rings  in  the  partition  Vf_  and  we  decrease  the  approximation  error  \\f  —  *4e/||L2(Q)-  We 
thus  have  Aef  £  Ejy  with  N  =  IV(e)  increasing  as  e  goes  to  zero.  In  practice,  we  would 
like  to  directly  control  the  number  of  rings.  This  leads  to  the  following  question:  given 
N  >  0,  can  we  find  e(lV)  such  that  \P€  \  =  IV,  or  equivalently  does  the  function  N(e)  reach 
all  possible  values  of  N  £  N  ?  Strictly  speaking,  the  answer  to  this  question  is  negative. 
However,  we  can  circumvent  this  difficulty  as  we  shall  now  describe. 

For  a  given  /,  and  a  given  N  £  Af ,  we  define 


(7.9) 

e(N)  :=  min{e  >  0  ;  \Pe\  <  N}, 

(7,10) 

n  =  w(v), 

and 

(7,11) 

An!  =  Ae(N)f  =  Pv*J- 

If  e(lV)  >  0,  the  minimum  is  attained  in  (7.9).  Indeed,  the  construction  of  7),  Vf_  and 
Aef  described  in  the  previous  section  ensures  that,  for  any  given  e  >  0,  there  exists  e  >  0 
small  enough  so  that  7)_|_s  =  7),  V€+s  =  W  and  Ae+Sf  =  Aef,  for  all  0  <  s  <  e. 

If  e(lV)  =  0,  then  from  Lemma  6.1,  /  £  Ejy.  We  can  therefore  apply  the  algorithm  with 
e  =  0  since  the  tree  7o  will  be  finite.  With  this  choice,  the  algorithm  gives  Aof  =  /  and 
therefore  An  f  =  /  as  well. 

In  order  to  prove  that  An  f  is  a  near  minimizer  for  the  //-functional,  we  first  need  two 
lemmas  that  will  be  used  to  compare  the  partition  Vn  produced  by  the  algorithm  and  the 
partition  that  is  associated  to  the  element  g  £  Sjy  which  is  a  known  minimizer. 

Lemma  7.1.  If  V  is  a  finite  set  of  pairwise  disjoint  rings  and  V1  a  partition  of  Q  into  a 
finite  number  of  rings,  then  for  each  K'  £  V' ,  there  are  at  most  two  sets  K  £  V  such  that 
K  fl  K'  ^  0  but  K  is  not  contained  in  K' . 

Proof.  Let  K'  =  I'  \  J'  where  J1  C  I1  and  J'  may  possibly  be  empty.  If  K  =  I  \  J  is  in  V 
and  K  fl  K'  ^  0,  then  I  fl  V  ^  0.  Hence  either  I  C  V  or  V  C  7.  We  shall  show  there  is  at 
most  one  K  of  each  of  these  types  that  intersects  K'  but  is  not  contained  in  K' . 

(i)  Case  1:  I'  C  I.  Suppose  that  there  were  two  sets  Ki  =  7i  \  J\  and  K2  =  I2  \  J2 
from  V  with  I1  C  7i,  12-  Then,  obviously  7i  fl  I2  7^  0  and  hence  without  loss  of  generality 
I'  C  I\  C  12-  For  K\  and  K2  to  be  disjoint  (as  they  must  be  since  both  are  in  V)  we  must 
have  I\  C  72-  But  this  means  K2  does  not  intersect  K' ,  which  is  a  contradiction.  Thus, 
we  have  shown  there  is  only  one  set  K  of  this  type. 

(ii)  Case  2:  I  C  I' .  Suppose  again  that  there  were  two  sets  Ki  =  7i  \  J\  and  K2  =  I2  \  J2 
from  V  with  I1  D  7i,  I2.  Then,  IiD  J1 ,  i  =  1,2,  since  otherwise  Ki  C  K' .  Hence,  J1  C  7i,  K- 
Obviously,  7i  n/2  7^  0  and  hence  without  loss  of  generality  7i  C  I2  C  I'  ■  Since  K\  fl  K2  =  0, 
we  have 

Ji  C  7i  C  J2  C  I2  C  I' 

Since  J1  C  Ii  C  J2,  this  is  a  contradiction  since  it  implies  that  772  C  K' .  □ 
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Lemma  7.2.  If  V  is  a  finite  set  of  pairwise  disjoint  rings  and  V  a  partition  of  Q  into  a 
finite  number  of  rings,  and  if  \V'\  <  N  and  [P\  >  2 N ,  then  the  subset  V1  of  all  K  £  V 
contained  in  some  K'  £  V'  satisfies  \T>1\  >  N. 

Proof.  Let  denote  by  V2  the  set  of  all  K  £  V  that  are  not  contained  in  any  K'  £  P' ,  and 
by  V 3  the  set  of  K'  £  V'  such  that  there  exist  K  £  V2  having  a  non-empty  intersection 
with  K' . 

By  the  previous  lemma,  each  K'  £  V 3  is  associated  with  at  most  two  K  £  V2  such  that 
K  and  K'  are  not  disjoint.  On  the  other  hand,  each  K  £  V2  is  associated  to  at  least  two 
K'  £  V 3  such  that  K  and  K'  are  not  disjoint.  We  thus  have  necessarily 

\v2\  <  \V3\  <  \V'\  <  N, 

so  that  {V1  \  =  \V\-\V2\>2N  -  N  =  N.  □ 

We  are  now  ready  to  prove  the  main  result  of  this  section. 

Theorem  7.2.  Let  f  £  ^(Q)  and  N  >  1  be  an  integer  and  M  :=  16iV.  The  function 
Am/  =  Ae(M)f  JS  a  near  best  approximation  to  f  in  the  sense  of  (7.6)  and  satisfies 

(7.12)  ||/  -  AmJWl.aq)  +  N-'S\’Q{AMf)  <  C’„K(f,N-'C-h 

with  Cf  =  8  Co  and  C 0  the  constant  of  Corollary  7.1. 

Proof.  We  consider  first  the  case  that  e  :=  e(M)  >  0.  Let  g  be  a  best  approximation  to 
/  from  Ejy  and  V  be  the  partition  associated  to  g.  Fix  an  arbitrary  0  <  rj  <  e  and  let 
V  =  Vn  be  the  partition  of  Theorem  6.1.  Then,  using  the  fact  that  rj  <  e  together  with 
Theorem  6.1,  we  find  M  <  |7>J/(/)|  <  8|iP|.  Hence  \V\  >  2 N  and  we  can  apply  Lemma  7.2 
to  find  a  set  V1  C  V  with  \V1\  >  N  and  each  element  K  £  V1  is  contained  in  some  ring 
of  V.  It  follows  that 


Nrl  <  11/  -  aK(j)\\l2(K)  <  \\f  -  9\\2L2(Q) 

K£V 1 


(7 


N 


iff 


Since  rj  <  e  is  arbitrary,  we  have 


Ne  <  arN(f) 


2 


Therefore, 

ll/-Aw/ll!l(W=  E  4(V)  <  Me  <  16aUff- 

K  £Ve 


Thus  Am?  is  a  near  best  approximation  to  /  with  parameter  a  =  4  and  (7.12)  follows 
from  Corollary  7.1. 

In  the  second  case,  where  e(M)  =  0,  we  have  Am/  =  Aof  =  /  and  /  £  TrM.  The  left 
side  of  (7.12)  does  not  exceed  IV-1/2  Vq(/).  Let  h  be  the  function  of  Theorem  7.1.  Since 
/  —  h  £  E 2(jv+m )  =  ^34iV’  we  have  from  the  Bernstein  inequality  (3.9) 


V3 

28a/34 


iV-1/2HQ(/-h)  +  iV 


1/2VQ(h)  > 


V3 

28a/34 


n-'Svq{J). 


\\f-h\\LMl+N-ll2VQih)> 
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Hence,  the  left  side  of  (7.12)  does  not  exceed 

and  the  proof  is  completed  by  invoking  inequality  (7.1).  □ 

8.  Direct  estimates  for  Haar  thresholding. 

In  this  section,  we  fix  a  function  /  in  BV  and  show  that  its  Haar  coefficients  are  in 
weak  i\.  That  is,  we  shall  show  that  when  the  Haar  coefficients  are  put  in  decreasing 
order  according  to  the  absolute  value  of  their  size,  then  the  n-th  rearranged  coefficient  is 
in  absolute  value  less  than  (7|/|bv/?U  with  the  C  an  absolute  constant.  We  shall  see  that 
this  also  yields  the  Jackson  estimate  (1.7)  for  Ejy. 

In  the  next  section,  we  shall  then  use  this  result  to  show  that  the  extremal  problems 

(1.1)  and  (1.2)  have  near  minimizers  which  can  be  obtained  by  wavelet  thresholding  of  the 
coefficients  with  respect  to  the  Haar  basis. 

Associated  to  each  dyadic  cube  I  =  [2_JAq ,  2_J(Aq  +  1))  X  [2_J&2, 2_J(&2  +  1)),  there 
are  three  Haar  coefficients  ce-  k  =  (/,  k)}  e  £  V,  k  =  (Aq,  Aq)  with  V  the  nonzero  vertices 

of  the  square  Q  =  [0, 1] 2  (see  (1.10-11).  In  this  section  as  well  as  in  §9,  we  shall  denote 

any  of  these  by  cj  =  cj(/)  and  the  corresponding  Haar  function  by  Hj\  when  we  state  a 
property  about  cj,  we  mean  any  of  these  three  coefficients  and  similarly  for  Hj. 

We  shall  assume  without  loss  of  generality  that  /  has  mean  value  zero  so 
coefficient  of  y>Q  is  zero.  We  shall  denote  by  7 n(f)  the  the  n-th  largest  of  the 

values  of  the  Haar  coefficients  c|  of  Hf,  I  £  e  £  V . 

We  begin  with  the  following  well-known  lemma. 

Lemma  8.1.  If  f  £  BV{Q)  and  e  >  0,  then  there  exists  a  continuous  function 
is  piecewise  continuously  differentiable  on  Q  such  that 

(8-1)  \\f  ~  Ml2(Q)  <  e 

and 

(8.2)  VqU.)  <  VQif  ). 

Proof.  This  can  be  proved  in  many  ways  by  mollification;  for  example  using  Steklov  aver¬ 
ages.  We  shall  prove  this  by  using  piecewise  bilinear  interpolants.  We  recall  (see  (2.11))that 

(8-3)  VQ(P*/)  <  Vq(/), 

where  P&  is  the  projector  onto  Sk-  Since  \\f  —  Pkf\\L2(Q)  goes  to  zero  as  k  tends  to  infinity, 
it  is  sufficient  to  prove  the  result  assuming  that  /  is  in  Sk- 

For  such  an  /,  and  0  <  e  <  2_fc_1,  we  define  a  tensor  product  grid 


that  the 
absolute 


ff  which 


(8.4) 


r 
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where  the  univariate  grid  T)  is  defined  by 

(8.5)  V]  :=  {0,1}  U  {2~kn  +  e  ;  n  =  0, .  .  .  ,  2k  -  1}  U  {2~kn  -  e  ;  n  =  l,...,2k}. 

The  /  is  well  defined  at  each  point  in  Te.  Let  f£  be  the  the  function  which  is  piecewise 
bilinear  relative  to  Te  and  interpolates  /  at  each  grid  point  in  Te.  That  is  f£  is  the  unique 
continuous  function,  which  is  piecewise  bilinear  (i.e.  of  the  form  a  +  bx  +  cy  +  dxy)  on  each 
rectangular  patch  defined  by  Te  and  equal  to  /  on  T£. 

One  easily  checks  that  by  construction, 

(8.6)  Vq(/«)<Vq(/), 

On  the  other  hand,  it  is  clear  that  f£  tends  to  /  in  Li2(Q)  as  e  goes  to  zero.  □ 

In  view  of  Lemma  8.1,  in  going  further,  we  can  assume  without  loss  of  generality  that 
/  is  continuous  and  piecewise  continuously  differentiable  on  Q.  Then, 

(8.7)  n-(/)=  f  [i/r,i  +  i/rj]. 

JK 

for  any  ring  K.  Therefore,  V(A")  :=  V/c(/)  is  set  additive  on  rings,  i.e.  V(Ab  U  Ab)  = 
V(ATi)  +  V(Ab)  for  any  two  disjoint  rings  K\  and  Ab- 

Theorem  8.1.  For  each  f  £  BV(Q)  and  each  n  >1,  we  have 

(8.8)  7  *(/)<<?!  ^^ 

n 

with  C\  =  3 QC[  an  d  C[  :=  216^5  +  72^3. 

Proof.  We  can  assume  that  /  is  continuous  and  piecewise  continuously  differentiable  on 
Q.  We  can  also  assume  that  Vq(/)  =  1  since  the  general  case  then  follows  by  scaling.  We 
shall  show  that  there  is  a  set  An  C  T*  such  that 

(i)  |A„|  <6-2",  n  =  l,2,..., 

(ii)  |cj|  <  C[2-n,Iikn, 

where  in  (ii),  cj  is  any  of  the  three  Haar  coefficients  associated  to  I.  It  is  easy  to  see  that 
this  implies  (8.8). 

We  shall  use  constructions  of  trees  similar  to  that  in  §6.  We  shall  also  use  the  abbreviated 
notation  V(S')  :=  V s(f)  for  any  set  S  in  the  algebra  of  rings.  For  each  m  =  1,  2, . .  .  ,  let 
Tm  denote  the  collection  of  all  cubes  I  £  T>  for  which  V(/)  >  2-m.  The  cubes  in  Tm  form 
a  tree.  Note  also  that  the  tree  Tm  is  contained  in  the  tree  Tm- |_i  and  we  can  obtain  Tm_ |_i 
from  Tm  by  growing  Tm . 

We  shall  give  each  cube  I  £  T>  an  index  m(I)  as  follows.  We  consider  the  four  children 
of  Ji  C  I,  i  =  1,2,  3,  4.  We  can  write  ~V(Ji)  =  2~mi+ei ,  where  nii  is  a  nonnegative  integer 
(or  rrii  =  oo)  and  0  <  <  1.  We  define  m(I)  as  the  second  smallest  of  the  four  numbers 

nii.  Another  way  to  describe  m(I)  (when  it  is  finite)  is  that  it  is  the  smallest  integer  m 
such  that  I  has  at  least  two  of  its  children  in  Tm.  Note  also  that  if  I  has  index  m  then 
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I  £  Tm- 1  and  I  has  at  least  two  children  in  Tm.  We  have  remarked  in  §6  that  for  any  tree 
the  number  of  branching  cubes  (i.e.  cubes  with  at  least  two  children  in  the  tree)  does  not 
exceed  the  number  of  final  cubes.  Since  the  final  leaves  of  Tm  are  disjoint  and  on  each  final 
cube  /,  V(J)  >  2_m,  it  follows  that  there  are  at  most  2m  cubes  I  in  T>  with  index  m. 

We  shall  also  define  a  distance  between  two  dyadic  cubes  J  C  I.  This  distance  is  the 
difference  of  the  dyadic  levels  of  J  and  /,  i.e. 

=  ^(log2  \I\  -log2  |J|). 

We  fix  n  >  0  and  define  for  all  0  <  m  <  n  the  set  Am  consisting  of  the  cubes  I  in  Tn 
which  contain  a  cube  J  with  index  m  =  m(  J)  which  satisfies  d(/,  J)  <  2 (n  —  m).  We  thus 
have 

(8.9)  \Am\  <  [2(n-m)  +  l]2m,  m  =  0,l,...,n. 

Defining  An  :=  U^=0Am,  it  follows  that 

n 

(8.10)  |A„|  <  t2(n  ~™)  +  !]2m  <  6  •  2n  -  1. 

m= 0 


so  that  (i)  is  satisfied. 

To  prove  (ii),  let  I  £  T>  be  a  cube  not  in  An.  We  consider  two  cases.  The  first  case  is 
when  I  (ji  Tn.  In  this  case  V(/)  <  2~n.  Let  (as  before)  aj  :=  aj(/)  be  the  average  of  /  on 
I.  By  Remark  4.2,  we  have  for  any  of  the  three  coefficients  cj, 


(8.11)  \cT\  <  |  J(f(x)  -  a^Hjix)  dx |  <  ||/  -  aj||i2(j)  <  V(I)  <  2~n. 

Hence,  we  have  verified  (ii)  in  this  case. 

Consider  now  the  remaining  case  when  I  £  Tn.  We  define  a  chain  of  cubes  I  =  Jo  D 
I\  A  ■  ■  ■  A  Ir  as  follows:  given  that  Ij  has  been  defined,  we  define  lj+i  as  the  child  of  Ij  in 
Tn  on  which  /  has  largest  variation.  The  chain  terminates  when  Ir  is  a  final  leave  in  Tn. 
Let  Kj  :=  Ij  \  lj+i,  j  =  0, .  .  .  ,  r  —  1,  and  Kr  :=  Ir.  The  three  children  J  different  from 
Ij- |_i  all  satisfy  V(J)  <  2~m^Ij'>+1 .  It  follows  from  the  additivity  of  V  that 

(8.12)  V(Kj)  <  6  •  2~m('Ij\  j  =  0, . . . ,  r  —  1. 


We  can  now  estimate  any  of  the  three  Haar  coefficients  cj  as  follows.  We  define 


(8.13) 


where 


9  :=  Y  ’ 

j= o 


(iKl 


K; 


f(x) dx. 


(8.14) 
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We  let  Hj  denote  the  Haar  functions  associated  to  I  and  cj.  Then, 


|c/|  = 


f(x)Hi(x)  dx 


<  N  '  /  I  f(x)  -  g(x) I  dx  + 

Jl  0 


g(x)H i(x)  dx 


=:  m  +  rj 2. 

We  can  estimate  rj i  by  using  Theorem  4.1  and  the  Cauchy- Schwarz  inequality.  This  gives 
Vi  <  \h\-1/2Y,Wf  ~  gWLdK,)  <  \h\-ll2j^\\f  ~  QhdK^K^/2 


j= 0 


j= 0 


<  6V3|Jo|-1/2  X)  V(^i)l^r/2  ^  6V3^2->  V(^)- 

j=0  j=0 

We  now  show  a  similar  estimate  for  r]2.  Since  g  is  a  constant  on  each  ring  Kj  we  get 

ri2<\Io\~1/2  \g(x)  -  aKo  |  dx  =  |To  | — 1/2  ^  /  \g(x)-aKo\dx 

J Ii  j=i  •'E' 


=  |Jo|-1/2  E  K",  -  OR-JIA',1  <  |Jo|-1/2  E  |A>|E  l«A„  -  «A- 

j=l  j  =  1  /t=l 

We  now  change  the  order  of  summation  to  find 


H~  1  I 


<  | Jo  |  1/2  X!  “  a/C-i  I  S  -  lJ°l  1/2  S  la/C  “  a*A-i  1141- 

/t  =  l  j=n  n  =  1 

For  each  /i,  the  set  K  :=  K ^  U  K^-i  is  a  ring  and  if  a  is  the  average  of  /  over  K ,  then 


I  ^  la*b  -  «l  +  “  al 


< 


1 


I K 


\f(x)  —  a\  dx  + 


< 


fd  I  J  K  I  {d  —  1 1  J  K 


\f(x)  —  a\  dx 


|  f{x)  —  a\  dx 


H-l 


I JW  I  J K 

<  |JC|1/2|^|-1||/-a||L2W  <  6^3^51  K,\~ 1/2  V( K). 


Since  | J0 1  1/2\Kll\  1'/2|J/t|  <  -^=2  ^ ,  we  obtain 
(8.15) 


m  <  12^5  J](l/(J^)  +  C(J^-i))2 

/t=i 


30  ALBERT  COHEN,  RONALD  DEVORE,  PENCHO  PETRUSHEV  AND  HONG  XU 
This  together  with  the  estimate  of  rj i  shows  that 


(8.16)  |cj|  <  (18^5  +  6^3)  £2-W(A-,-)=5>> 

j= 0  A; =0 

where  Sj t  consists  of  that  portion  of  the  sum  on  the  right  side  of  (8.16)  corresponding  to 
the  terms  for  which  m{Ij)  =  k.  Then,  as  we  have  shown  earlier,  V{Kj)  <  6  •  2~k  for  each 
such  j .  Also,  Ij  is  at  a  distance  >  2 (n  —  k)  from  I  because  of  the  definition  of  Ak  and  An. 
Hence, 

O© 

(8.17)  S'*  <  (108^5  +  36^3)  ^  2“^  =  (108^5  +  36V3)2“2n+fc. 

v=2(n  —  fc)  +  l 


We  now  return  to  (8.16)  to  find  that 


(8.18)  |cj|  <  (108^5  +  36^3)  ^  2~2n+k  <  (216^5  +  72V3)2~n. 

k= 0 

Thus,  we  have  provided  the  desired  estimate  for  these  I  as  well.  □ 

Theorem  8.1  immediately  yields  a  direct  estimate  for  Haar  thresholding.  For  this,  we 
define  two  nonlinear  operators  associated  to  the  Haar  decomposition.  Let  /  have  mean 
value  zero  on  Q  and  /  =  ^  c}Hf .  We  define  for  e  >  0 

(8.19)  nj=J2ciHh 

|cf|>e 


the  thresholding  of  /  at  level  e,  and  for  each  positive  integer  N 

(8.20)  gNf=  J2  CIH en 

( i,e)eEN(f ) 

the  best  approximation  of  /  from  the  set  E^if)  contains  the  indices  of  the  N  largest 
Haar  coefficients  c}  of  /.  In  the  case  of  ties  in  the  size  of  the  coefficients  we  make  an 
arbitrary  assignment  to  the  set  E^if)  in  order  to  remove  the  ambiguity. 

Theorem  8.2.  If  f  £  BV  has  mean  value  zero  on  Q,  we  have 

(8.21)  IIZ-W./llMQ)  <  C2[£Vq(/)+, 

and 


mf,  ||/-ff||MQ1  =  IIZ-Sn/IImq)  <  C3W1/2  VQ(/). 

fc2jJV 


(8.22) 
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with  C2  =  2\[C\  and  C3  =  C 1  with  C 1  tie  constant  of  Theorem  8.1. 

Proof.  If  e  >  Vq(/),  then  (8.11)  and  (8.12)  follow  trivially  from  the  embedding  theorem 
(Theorem  4.1  and  Remark  4.2).  We  can  therefore  assume  Vq(/)  >  e  in  going  further.  For 
each  n,  let  yn  :=  7 n(/)  denote  the  n-th  largest  Haar  coefficient  of  /  in  absolute  value  and 
for  each  k  =  0, 1, .  .  .  ,  let  A&  :=  {n  :  jn  <  2-fce}.  We  then  have 

11/  -  «</iiL(q>  =  £  it  =  E  £  it 

nGA0  fc>0  nGAfc\Afc  +  i 

(o.Zo) 

<  e2  '^2  2~2k\Ak  \  Afc+i  | - 

k>0 

For  each  n  £  A ^  \  A ^+i ,  we  have  jn  >  2~k~1e  and  hence  from  Theorem  8.1,  |A&  \  A ^+i  |  £ 
Ci  V Q(f)2k+1  / e.  Using  this  in  (8.23)  we  arrive  at  (8.21). 

For  (8.22),  we  have  from  Theorem  8.1, 

11/  -  fo/lll(Q>  =  £  i/<C?Vq(/)2  Y.  »“2  <ClVQ(ffN-'.  □ 

n>iV+l  n>iV+l 

9.  Minimization  of  the  K  and  U-functionals  by  Haar  thresholding. 

We  shall  now  show  that  Haar  thresholding  provides  near  minimizers  for  (1.1)  and  (1.2). 
For  this,  we  shall  thus  prove  a  stability  result  concerning  the  nonlinear  operators  that  we 
have  introduced  in  the  previous  section. 

Theorem  9.1.  The  operators  Qn  and  Tte  satisfy  for  all  e  >  0,  N  >  0  and  f  £  BV(Q), 

(9-1)  Vq(&v/)<<74VQ(/), 

and 

(9.2)  yQ(n€f)<c^vQ(f), 

with  C4  =  10  +  281/2(181/3  +  C3)  and  C3  the  constant  of  Theorem  8.3. 

Proof.  Clearly,  it  suffices  to  prove  (9.1)  since  7ief  =  Gn$  for  some  N  =  IV(e).  Let  g  be  a 
best  approximation  to  /  from  TrN.  We  can  write  g  =  P-pf  with  V  the  partition  associated 
to  g.  Recall  that  each  element  of  YhrN  is  in  and  a^so  Qn f  is  in  £4^-  Therefore,  we 

haVC  V0(C,v/)  <  VQ(g)  +  VQ(SNf  -  9) 

<  10  VQ(/)  +  ^|(6JV)V2NaAr/  _ 

<  10Vc(/)+28v/2Arl/2[||/ -  S\\l.jQ)  +  11/  -  ftv/lkcq)] 

<  [10  +  28l/2(18l/3  +  C3)]  VQ(/), 


where  we  have  used  Theorem  5.1  to  estimate  Vq(</)  and  the  inverse  estimate  (3.9)  for  YhcN 
as  well  as  the  direct  estimates  (6.21)  and  (8.22)  in  the  estimate  of  Vq{Qn f  ~  <?)•  □ 
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Remark  9.1  The  stability  of  the  Haar  thresholding  is  a  quite  surprising  result  since  the 
operation  of  discarding  coefficients  is  in  general  not  uniformly  stable  in  BV  (i.e.  stable 
independently  of  the  set  of  coefficients  which  is  discarded).  Also  in  the  proof  of  this  result, 
we  have  made  use  of  our  approximation  results  for  TrN:  a  more  direct  proof  of  this  stability 
is  still  to  be  found.  Note  that  we  also  have  used  decompositions  into  rings  to  prove  that 
the  Haar  coefficients  of  a  BV  function  are  in  weak  l1,  leaving  open  the  possibility  of  a 
more  direct  proof. 

Theorem  9.2.  For  each  N  >  1,  and  each  f  £  L2(Q),  we  have 

(9.3)  11/  -  £w/IIMQ)  +  N-'V  VQ(gNf)  <  C'5A'(/,  jv-cq, 

with  C's  =  (112-^|  +  1)(73  +  C4  with  C3  the  constant  of  Theorem  8.3  and  C4  the  constant 
of  Theorem  9.1. 

Proof.  Let  g  be  any  function  in  BV(Q).  Since  Gn/  is  the  best  N  term  approximation  to 
/,  we  have 

11/  -  &v/||l2(q)  <  11/  -  Sn9\\l2(q) 

<  \\f  ~  9\\l2(Q)  +  \\g  ~  Gn9\\l2(q) 

<  \\f  ~  9\\l2(Q)  +  CsN-1/2  VQ(g), 

where  the  last  inequality  uses  Theorem  8.3.  The  function  Gn  f  ~  Qn9  is  in  EgW.  We  can 
therefore  use  the  Bernstein  inequality  (3.9)  and  Theorem  9.1  to  obtain 

N~1/2  VQ(gNf)  <  N~1^2[Vq(Qn f  -  gNg)  +  yQ(GN9)] 

56a/2 


< 


< 


< 


V3 

112  \/2 


V3 

112V2 


\\Snf  -  SN9\\l2,Q)  +  C4A--1/2  Vg(g) 
11/  -  5iv<jIIl,«j)  +  C4Ar_1/2  Vq(g) 


11/  -  »ll MQ>  +  ( +  C4)JV-‘C  VQ(«). 


II J  i'll  1  V 

Combining  these  two  estimates,  we  obtain 

(9.4)  11/  -  tJjv/IU^Q)  +  N-02  vQ(gNf)  <  c5[||/  -  <,||MQI  +  N-'S  V5(9)]. 

Taking  an  infimum  over  all  g  £  BV(Q)  gives  (9.3).  □ 

Our  next  result  concerns  the  minimization  of  the  [/-functional,  i.e.  problem  (1.1).  As 
in  the  case  of  the  Besov  space  B\(Li),  a  thresholding  procedure,  now  in  the  Haar  system, 
yields  the  approximate  minimizer. 

Theorem  9.3.  For  each  e  >  0,  and  each  f  £  L2(Q),  we  have 


(9.5) 


11/  -  «</IILq)  +  W0(W,(/))  <  CW(/,f), 
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with  Cq  =  C4  +  112(7f  +  4Ci  +  2  and  Ci  tie  constant  of  Theorem  8.2,  C2  the  constant  of 
Theorem  8.3  and  C4  the  constant  of  Theorem  9.1. 

Proof.  Let  g  be  any  function  in  BV(Q).  We  first  remark  that  we  have 

(9.6)  ll/-W./Hl,(Q)  <II/-W2.»IImq,- 

Indeed,  if  the  coefficient  c/(/— 7de/)  =  (/— 7de/,  Hi)  is  non  zero,  then  necessarily  |cj(/)|  <  e 
and  cj(/  —  TCef)  =  cj(/).  For  this  coefficient,  we  either  have  |c/(g)|  <  2e,  in  which  case 

(9.7)  cj(/  -  Hef)  =  cj(/)  =  cj(/  -  H2eg), 
or  \ci(g)\  >  2e,  in  which  case 

(9-8)  | cT(f  -  H2eg)\  =  \ci(f )  -  cT(g )|  >  e  >  |cj(/  -  HJ))\. 

In  all  cases  the  coefficients  of  /  —  7^2 e <7  dominate  those  of  /  —  7 de/,  so  that  (9.6)  holds. 
We  thus  have 


(9.10) 


11/  -  «./lll(Q)  <  2||/  -  9 |||I(W  +  2\\g  -  H2,gfL2 
<2||/-9|||l(o)+4CyVQ(9). 


where  we  have  used  (8.21)  of  Theorem  8.3. 

We  now  estimate  the  variation  of  7 de/  as  follows:  using  Theorem  9.1,  we  obtain 


(9.11) 


Vq(K/)  <  VQ(H€f  -  Htg)  +  VQ(H€g) 
<VQ(Hef-Heg)  +  C±YQ(g). 


We  are  left  with  estimating  the  variation  of  7 de/  —  7ieg.  For  this,  we  write 


(9.12) 


rtj  -  H.y  =  -  H.g\  +  -  Hfy\ , 


where  for  a  function  h ,  7ieh  :=  h  —  7ieh  is  the  part  of  the  Haar  expansion  of  h  corresponding 
to  the  coefficients  which  satisfy  |cj(h)|  <  e.  Using  the  inverse  estimate  (3.4)  of  Remark 
3.1  and  then  (9.10),  we  have 

Vq(H«[H«/  -  n.g])  <  8  6-'  II H,f  -  n,g\\'lAQ)  <  16  [||Wt/  -  /||i,(Q)  +  ||/  -  H,g |||l(ol] 

<  16  e  1  [2||/  -  y|ll=(Q)  +  iCi^cVQig)  +  2||/  -  g\\\2iQ)  +  2||t/  - 

<  16f-1[4||/-9||ll(0|  +6[C'2]2eV0(<;)], 

where  the  last  inequality  again  uses  (8.21)  of  Theorem  8.3. 

It  remains  to  estimate  the  variation  of  7ie[Hef  —  Heg\.  For  this,  we  remark  that  if 
0  <  |cj(7de/  —  Heg)\  <  e,  then  necessarily  \ci(g)\  >  e.  In  other  words,  if  we  denote  by 
Ng(e )  the  number  of  coefficients  of  g  above  the  threshold  e,  we  see  that  7ie[Hef  —  Heg\  has 
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at  most  Ng(e )  non-zero  coefficients.  We  can  then  use  the  inverse  estimate  (3.1)  of  Theorem 
3.1  to  obtain 

(9.13)  VQ(He[HJ  -  n€g])  <  S^e)]1/2 \\7iJ  -  H€g\\L2(Q). 

,/From  Theorem  8.2,  we  have  the  estimate 

(9.14)  Ng(e)<C1e-1VQ(g). 

Combined  with  (9.13),  this  gives 

eVQW,[H,f  -  H,g])  <  8 eJCV"1  Vq(9)]1-'2|| H,f  -  H,g ||i,(W 

<44CiVq(9)  +  £-1||W</-W,9|I|i(qi] 

<  4C,e  Vq(9)  +  8||/  -  Ht/|||,(0|  +  8||/  -  H,g |||,(Q| 

<  4Ci£Vq(9)  +  16|| /  —  9||L(Q)  +  32CgVQ(g)  +  16||/  —  9||L(Q)  +  lulls'  —  ^GIIl2 

<  4C,  £  Vq(9)  +  32|| /  —  +  32C|£  Vq(9I  +  I6C2  £  Vq(9) 

<  32||,f  -  9H!,(0,  +  (4Ci  +48C2)£Vo(9). 

where  we  have  used  (9.10)  and  (8.21)  of  Theorem  8.3. 

Combining  all  our  estimates  we  obtain 

(9.15)  ||/  -  H./llUo,  +  €  Vq(««(/))  <  98||/  -  9||ii(Q|  +  (C4  +  148Cf  +  4C,  )eVQ(g), 

which  gives  (9.5)  by  taking  the  infimum  over  all  g  £  BV.  □ 

10.  Interpolation  spaces  between  L 2  and  BV. 

As  a  by  product  of  our  results,  we  shall  obtain  several  results  concerning  interpolation 
spaces  between  L2(Q )  and  BV(Q).  For  each  0  <  a  <  1  and  0  <  g  <  00,  let  Aq(L2(Q)) 
denote  the  set  of  functions  /  £  L2(Q )  such  that 

(10-!)  |/U“(l2(Q))  :=  \\{Na(TN{f))\\i*q(J+)  <  00 

where  ctjv(/)  =  infi^s^  \\f  —  <7||l2(q),  'Bn  is  any  of  the  three  families  E]y,  BrN  or  E(y,  and 
with  £*  the  £q  norm  with  respect  to  Haar  measure: 

ll(0„)l|f.:={(^1Klfi)1A  0  <  9  <  00 

l  supn>1  \an\,  g  =  00. 

Then,  it  follows  from  the  Jackson  and  Bernstein  estimates,  which  were  proved  through¬ 
out  the  paper  for  these  different  families  of  approximation  spaces,  that 

(10.2)  A q(L2(Q))  =  (L2(Q),BV(Q))a,g,  0<a<l,  0<?<oo 

with  equivalent  norms,  where  (L2(Q),  BV(Q))ajq  are  the  real  interpolation  spaces  for  the 
pair  (L2(Q),  BV(Q))  (see  [DL,  Chapter  5]  for  the  definition  of  interpolation  spaces  and 
for  the  general  mechanism  relating  these  with  approximation  spaces,  through  Jackson  and 
Bernstein  estimates). 

Moreover,  it  was  shown  in  [DP]  that 

(10-3)  Aaq(L2(Q))  =  (MQIBKMQ)))^, 

in  the  case  of  the  particular  family  BCN. 

We  thus  obtain  the  following  corollary  to  our  results,  where  the  second  statement  ex¬ 
ploits  the  known  interpolation  results  for  Besov  spaces  (see  [T]  or  [DPI]). 
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Corollary  10.1.  We  have 

(10.4)  (L2(Q),BV(Q))a!q  =  (L2(Q),Bl(L1(Q)))a!q,  0  <  a  <  1,  0  <  g  <  oo 

and  in  particular 

(10.5)  (L2(Q),  B V(Q))a,q  =  Bqa(Lq(Q)),  0  <  a  <  1,  1/q  =  1/2  +  a/2. 
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